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1  Introduction 


This  project  initiated  06/29/2001  with  the  goals  of  designing  and  testing  architectures  and  ap¬ 
plications  for  a  scalable,  fault-tolerant,  quantum  information  processing  system.  Our  project  was 
a  collaborative  computer  science  and  physical  sciences  effort  involving  four  groups  working  in 
tight  coordination,  with  MIT  providing  experimental  quantum  technology  parameters  and  fun¬ 
damental  expertise  in  quantum  information  theory,  U.C.  Davis  devising  fault-tolerant  architecture 
designs  and  implementing  numerical  simulations,  U.C.  Berkeley  creating  quantum  cryptosystems 
and  providing  distributed  applications  pull,  and  U.  Washington  focusing  on  new  languages  for 
quantum  computation,  and  an  architectural  simulator. 

Our  principle  aim  was  to:  design  a  complete  system  architecture  for  a  realistic  programmable , 
arbitrary -scale  quantum  computer,  focusing  on  maximum  reduction  of  overhead  and  reaching  to¬ 
wards  two  targets:  a  solid-state  spin-based  quantum  computer,  and  application  of  quantum  infor¬ 
mation  to  the  real-world  problems  in  secure  distributed  information  storage. 

We  report  the  following  major  accomplishments  over  the  span  of  this  project  timeline,  from 
06/29/01  to  03/31/06: 

•  Detailed  architecture  design  for  large-scale,  fault-tolerant  quantum  computers: 

1.  Identification  of  quantum  wires  and  quantum  communication  requirements  as  a  key 
bottleneck  in  quantum  processor  design  (ISCA'03  paper) 

2.  Design  of  quantum  FPGA  and  micro-architecture  processor  layout  for  optimized  per¬ 
formance  and  reduced  overhead  (ISCA'05  and  ISCA'06  papers) 

•  Software  tool-chain  for  quantum  computer  architecture  design  and  analysis 

1.  Development  of  systematic  scalability  criteria  for  reliable  large-scale  quantum  comput¬ 
ers  (IEEE  Computer'02) 

2.  Implementation  of  predictive  quantum  CAD  tools  for  quantum  architecture  design  and 
evaluation  (IEEE  Computer  '06) 

•  Enabling  experimental  results  for  quantum  computation 
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1.  Realization  of  quantum  optimization  algorithm  (Phys.  Rev.  Lett  '03) 

2.  Implementation  of  first  standard  quantum  algorithm  with  a  trapped  ion  (Nature'04) 


Major  publications  detailing  these  results,  specifically  including  those  cited  here,  appear  in  the 
appendix  of  this  report  (cf  VIII  Article  28  C.2  of  the  project  agreement). 


2  Project  motivation,  approach,  and  timeline 


The  main  goal  of  this  project  was  a  quest  to  understand  what  key  interchangeable  elements  form 
a  scalable,  fault-tolerant  quantum  information  system  architecture  (Fig.  1).  Our  focus  was  squarely 
on  large-scale  quantum  computation,  meaning  a  quantum  computer  capable  of  solving  problems 
such  as  factorization  of  1024  bit  numbers,  which  can  reasonably  be  expected  to  require  O(106) 
qubits  even  when  using  perfect  hardware.  With  imperfect  hardware,  quantum  error  correction  is 
required,  which  requires  an  overhead  of  perhaps  103  to  106  times  more  qubits,  depending  on  the 
base  error  rate. 


Project  Approach 


Goal:  Large-scale  quantum  computation 
Approach: 

What  key  interchangeable  elements  form  a 
scalable ,  fault-tolerant  quantum  information 
system  architecture ? 


Detailed  focus  issues: 

1 .  What  system  design  is  necessary? 

2.  What  crucial  experiments  /  technology  is  missing? 

3.  What  new  applications  are  enabled? 


Figure  1:  Summary  of  main  project  problem  and  approach. 


Our  approach  to  addressing  this  problem  was  to  focus  on  three  main  issues:  (1)  the  system 
design  elements  necessary  for  large-scale,  reliable  quantum  computation,  (2)  the  missing  exper- 
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iments  or  technology  necessary  to  realize  complete  quantum  computing  systems,  and  (3)  new 
applications  which  might  be  enabled  by  large-scale  quantum  computation. 

Crucially,  we  realized  early  in  the  project  that  we  would  have  to  focus  on  designing  reliable 
quantum  computers  from  unreliable  parts,  because  of  the  inevitability  of  a  high  rate  of  errors 
in  quantum  systems  due  to  the  intrinsic  nature  of  decoherence.  Quantum  states  do  not  readily 
remain  in  superposition,  and  usually  require  active  correction  to  maintain  useful  stability. 


Project  Timeline:  ’01 -’06 


Original  milestones  (from  project  proposal) 
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quantum 
architecture 


2001 


Q.  Digital 
Signatures 

crypto 


Design  of 
practical, 
reliable  QC 


(IEEE  Computer) 


Quantum 

Memory 

Hierarchies 

(SPAA) 


Quantum 
Wires 

DJ  (ISCA’03) 

Algorithm 
with  Ca+  ion 
(Nature) 

DARPA  Tech  Silicon  QC 
Significant  Analysis 

Achievement  ‘02  (JQE) 


QIP’04 

@MIT 


Design: 
Shor’s 
Algorithm  in 
ion  trap  QC 

BIT 

(RRL) 


proj« 


set  m  i 


Y5:  Design 
experimental 
realization 


006 


QFPGA 
Design 
(ISCA’05)  Q-  Proc 
Design: 
Two 

QC  CAD  papers 
Software  (ISCA’06j 
Toolchain 
(IEEE 
Computer) 


Figure  2:  Timeline  of  major  project  results,  in  comparison  with  original  goals. 


The  project  started  with  five  main  goals  (Fig.  2):  design  of  a  quantum  architecture,  simu¬ 
lation  of  architectures  constructed  from  faulty  components,  evaluation  of  a  benchmark  applica¬ 
tion  (Shor's  algorithm),  application  to  a  large-scale  distributed  classical  computing  system  (Ocean 
Store),  and  design  and  evaluation  of  a  possible  experimental  realization  of  a  full  large-scale,  reli¬ 
able  quantum  computing  system. 


3  Project  accomplishments  and  self-evaluation 


The  actual  goals  which  we  successfully  accomplished  during  this  project  matched  four  of  the  five 
original  goals  quite  well,  and  there  were  unanticipated  surprises  as  well  (Fig.  3). 

The  four  originally  anticipated  goals  which  we  accomplished  were: 
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Accomplishments 

Detailed  architecture  design  for  large-scale  QC 
Quantum  Wire  /  thresholds  for  QC  -isca  o3 
Quantum  FPGA  &  Micro-architecture  -isca’05&  06 

\ 

Software  tool-chain  for  Quantum  Computers 
Scalability  criteria  for  reliable  QC  ieee  computer’02 
Quantum  CAD  tools  for  design  -  ieee  computer  os 

.  % 

Enabling  experimental  results  for  QC 

Realization  of  Q.  optimization  algorithm  -  prl  o3 
First  Q.  algorithm  with  trapped  ion  -  Nature  04 


Figure  3:  Summary  of  major  project  accomplishments. 

•  The  design  of  a  variety  of  complete  quantum  architectures  for  large-scale,  reliable  quantum 
computers,  including  identification  of  quantum  wires,  memories,  control  systems,  and  spa¬ 
tial  layout  as  key  design  elements  (IEEE  Computer '02,  SPAA'03,  and  ISCA'03  &  '05  papers) 

•  Simulation  of  architectures  of  reliable  quantum  computers,  and  the  implementation  of  a 
predictive  design-tool  to  analyze  system  reliability  given  technology  parameters  and  con¬ 
straints  (IEEE  Computer  '02  and  '06  papers) 

•  Evaluation  of  the  requirements  and  possible  performance  of  Shor's  factoring  algorithm  on  a 
complete  benchmark  quantum  architecture  design  (ISCA'05  and  '06  papers) 

•  Design  of  potential  experimental  realizations  and  implementation  of  actual  experiments 
to  identify  crucial  parameters  for  fault-tolerant  quantum  architectures  (Nature'04,  JQE'03, 
PRL'04,  and  ISCA'06  papers) 

The  single  goal  which  we  did  not  make  much  progress  on  was  our  hope  to  identify  new  appli¬ 
cations  for  large-scale  quantum  information  processors  in  secure,  distributed  classical  computing 
applications  such  as  Ocean  Store.  Actually,  early  in  the  project,  we  proved  that  one  of  our  original 
ideas,  to  enable  secure  remote  computation  using  quantum  protocols,  was  not  possible.  How- 


4 


ever,  near  the  start  of  the  project,  in  2001,  we  successfully  completed  a  study  on  quantum  digital 
signatures,  which  can  be  used  for  authentication  and  transferable  authentication  in  multiparty 
systems. 

Two  of  the  most  interesting  lessons  we  learned  in  this  project  were  perhaps  that  (1)  fault- 
tolerance  is  a  crucial  concept  which  is  widely  invoked  but  not  well  understood,  particularly  with 
respect  to  resource  requirements,  and  (2)  surprisingly,  solid  state  (silicon  based)  quantum  comput¬ 
ing  was  initially  very  promising,  but  ultimately  unrealistic  due  to  quantum  communication  and 
wiring  needs. 

And  perhaps  the  most  interesting  unanticipated  success  we  had  was  with  another  technology: 
trapped  ion  quantum  computation.  Originally,  we  did  not  have  great  hopes  for  that  technology, 
but  upon  detailed  study  and  modeling,  it  turned  out  to  be  extremely  promising,  much  more  so 
than  other  currently  available  quantum  computing  technologies.  We  identified  this  promise  in 
2003,  just  as  experimental  successes  at  NIST  and  the  University  of  Innsbruck  were  coming  about, 
demonstrating  basic  quantum  protocols  such  as  quantum  teleportation,  quantum  error  correction, 
and  simple  quantum  algorithms,  such  as  the  Deutsch-Jozsa  algorithm  which  we  played  a  role  in 
making  possible. 

4  Summary  of  Main  Scientific  Results 

Three  main  accomplishments  of  this  project  deserve  special  identification:  our  results  in  architec¬ 
ture  design,  scalability  criteria,  and  software  design  tools  for  fault-tolerant  quantum  architectures. 

4.1  Quantum  architecture  concepts 

Modern  computer  architecture  is  about  the  optimization  of  one  central  goal:  parallelism.  Modern 
CPUs  employ  concepts  such  as  functional  unit  specialization,  speculative  execution,  H-tree  clock 
distribution,  and  subsystem  power  control,  to  maximize  performance  and  minimize  energy  cost. 
Quantum  architecture  focuses  on  a  different  central  goal:  reliability,  because  quantum  noise  is 
an  unavoidable  fact  in  any  realistic  quantum  computer  implementation,  and  must  be  managed 


5 


carefully  from  a  systems  approach. 


Architecture  Concepts 


Modern  Q.  computer  arch.  =  seek  reliability 


Q.Ancilla  Factory 

7-qubit  Steane  code 


Classical  Controller 


Knill,  Nielsen  &  Chuang,  Steane,  Chong,  Cross,  Kubiatowicz,  Oskin 


Figure  4:  Illustrative  major  result:  concepts  for  quantum  processor  design. 


One  major  result  of  this  project  was  our  introduction  of  new  concepts  for  quantum  architec¬ 
ture  (Fig.  4),  including  entropy  exchange  units,  code  conversion  teleportation,  quantum  wires, 
entanglement  based  clocks,  and  entanglement  "power  sources."  From  this  work,  we  developed 
a  considerable  understanding  of  the  overall  cost  of  fault  tolerance  in  quantum  computation,  and 
how  this  can  be  reduced  through  design  improvements  in  balancing  memory,  computation,  and 
communication  (see,  in  particular,  our  ISCA'02,  ISCA'05,  and  ISCA'06  papers).  Our  results  lead  to 
building-block  based  designs  that  are  conceptually  clean,  buildable,  and  debuggable. 


4.2  Scalability  criteria 

The  overall  system  cost  of  fault-tolerance  in  quantum  architectures  is  very  high,  but  many  ele¬ 
ments  of  this  cost  have  largely  been  neglected  in  the  community  until  our  work.  For  example,  the 
number  of  wires  required  grows  exponentially  with  the  number  of  levels  of  concatenation  used 
in  fault  tolerance  constructions,  and  moreover,  these  wires  must  generally  connect  gates  spatially 
separated  by  the  entire  size  of  the  code  block!  Naturally,  providing  such  communication  capabil¬ 
ity  cannot  come  for  free,  but  this  cost  was  disregarded  in  all  early  feasibility  claims  for  quantum 
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computing  proposals.  In  particular,  results  early  in  our  project  (ISCA'03  and  related  papers)  iden¬ 
tified  the  cost  of  such  quantum  communication  needs  as  the  primary  pitfall  in  constructing  real¬ 
istic  quantum  computing  systems  based  on  solid  state  devices  with  fixed  qubits,  such  as  Kane's 
original  impurity-based  qubit  scheme.  Another  widely  neglected  issue  is  the  cost  of  the  classi¬ 
cal  control  system  needed  to  schedule,  perform,  and  stabilize  fault-tolerant  quantum  gates.  This 
control  system  must  be  much  more  reliable,  faster,  and  more  parallel  than  the  quantum  system 
it  controls,  so  for  example,  if  qubits  are  running  with  a  nanosecond  timescale  clock,  the  classical 
control  system  should  run  with  a  subnanosecond  timescale  clock.  For  many  quantum  computing 
technologies,  this  would  require  a  control  system  that  is  far  too  power-hungry  to  be  realistic,  due 
to  cryogenic  cooling  requirements  for  the  qubit  implementation. 


Criteria  for  Scalable  QC 


Which  QC  implementation  will  be  successful? 

Most  ignore  systems  needs! 

The  system  architecture  drives  device  requirements: 


•» 

iX 

4*  W 


T 


MB 


1.  Good  quantum  wires 

2.  Maximum  parallelism 

3.  Local  (fast)  measurement 

4.  Fast  (local)  classical  control 

5.  Complex  state  preparation 


& 


IEEE  Computer:  Jan  2002,  p  79 


jig  V  QARC 


Quantum  Architecture  Research  Center  | 


Figure  5:  Illustrative  major  result:  criteria  for  scalable  fault-tolerant  quantum  computation. 

More  broadly,  through  our  project  work  over  four  years,  we  have  identified  a  set  of  five  criteria 
(Fig.  5),  initially  sketched  in  our  IEEE  Computer '02  article,  which  must  be  satisfied  for  a  fault- 
tolerant  quantum  architecture  to  obtain  realistic  performance.  These  criteria  stipulate  that  a  good 
quantum  computing  system  much  satisfy  not  just  the  normal  DiVincenzo  criteria,  but  also  must 
have: 

1.  Good  quantum  wires:  the  ability  to  move  quantum  information  between  nearly  any  two 
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points  in  a  quantum  processor  at  a  reliability  high  compared  with  gate  error  probabilities. 


2.  Maximum  parallelism:  the  ability  to  perform  multiple  quantum  gates  in  a  single  timestep. 

3.  Local  (fast)  measurement:  the  ability  to  measure  nearly  any  qubit  in  the  system,  without 
requiring  more  than  a  constant  amount  of  movement,  and  in  a  time  comparable  to  a  single 
gate. 

4.  Fast  (local)  classical  control:  facilities  for  gates  to  be  applied  to  qubits  at  the  proper  time  and 
place,  measurement  results  to  be  used  in  feedback  to  correct  errors,  code  syndromes  to  be 
extracted  and  voted  upon,  and  qubits  to  be  scheduled  and  moved  for  inter-gate  communi¬ 
cation. 

5.  Complex  state  preparation:  the  ability  to  prepare  not  just  |0)  states,  but  also  complex  states 
such  as  logically  encoded  |0l)  and  \±l)  states.  Bell,  and  cat  states,  |00  •  •  •  0)  +  1 1 1  •  •  •  1). 

The  lack  of  any  of  these  elements  will  likely  result  in  a  significant  worsening  of  the  fault  tolerance 
threshold  for  the  system,  compared  to  the  ideal  threshold  determined  by  code  properties.  We  have 
studied  and  quantified  such  costs  in  many  of  our  publications;  see,  for  example,  the  S.M.  thesis  of 
Andrew  Cross  (available  at  the  MIT  DSpace  archive  permanent  URL 

http  :  //hdl .  handle  .  net  / 1721 .1/30175). 

4.3  Software  tool-chain  for  quantum  CAD 

Modern  computers  are  designed  first,  then  built  afterwards.  This  is  made  possible  by  the  use  of 
predictive  software  tools,  which  allow  computer  aided  design  of  models  which  accurately  reflect 
the  performance  of  actual  chip  implementations.  One  major  result  of  this  project  was  the  develop¬ 
ment  of  several  suites  of  new  tools  for  predictive  analysis  and  simulation  of  quantum  architectures 
(Fig.  6). 

Unlike  classical  CAD  tools,  which  focus  on  just  performance  optimization  and  aiding  in  the 
design  of  complex  systems,  these  quantum  CAD  tools  focus  on  achieving  and  evaluating  relia¬ 
bility.  We  began  with  an  initial  tool  developed  to  study  solid-state  implementations,  then  turned 
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The  ARQ  Q.  Design  Tool 
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“Towards  a  software  architecture  for  quantum  computing  design  tools,"  K. 
Svore,  A.  Cross,  A.  Aho,  I.  Chuang,  I.  Markov,  PWQL,  pi 45-1 62,  2004 


Figure  6:  Illustrative  major  result:  modular  architectural  design  tools  for  large-scale  fault-tolerant 
quantum  computation. 


this  into  a  full-blown  analysis  tool  for  evaluating  trapped  ion  quantum  computer  architectures. 
By  simulating  just  quantum  error  correction  circuits,  a  class  which  require  only  "Clifford  group" 
gates  that  are  easily  classically  simulated,  we  were  able  to  simulate  quantum  circuits  with  0(1000) 
qubits  efficiently. 

We  deployed  this  on  a  Beowulf  cluster  to  compute  fault  tolerance  thresholds  for  trapped  ion 
quantum  computers,  analyzing  the  impact  of  specific  technology  parameters  such  as  ion  move¬ 
ment,  memory,  waiting,  one-  and  two-qubit  gates,  measurement,  and  preparation.  Using  this  tool, 
we  computed  a  first  set  of  new  thresholds  for  fault-tolerant  quantum  computation  in  the  presence 
of  realistic  resource  assumptions  (Fig.  7). 

This  software  tool  has  since  evolved  in  several  directions,  including  a  separate  branch  devel¬ 
oped  at  U.  Washington,  and  one  at  Columbia  University  in  collaboration  with  A1  Aho's  group 
there.  His  group  introduced  the  idea  of  using  feedback  in  the  simulation  to  allow  optimization  of 
thresholds.  The  tool  has  been  used  in  teaching  of  students  in  compiler  optimization  techniques, 
and  is  now  also  a  basis  for  development  of  a  hardware  "physical  operations"  language  for  basic 
quantum  computer  operations. 

Based  on  these  results,  we  believe  similar  tools  can  (and  should!)  be  developed  to  accurately 
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Thresholds  for  Reliable  QC 
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Figure  7:  Illustrative  major  result:  thresholds  for  fault-tolerant  quantum  computation 


predict  fault  tolerance  thresholds  for  quantum  computers  implemented  with  other  technologies, 
such  as  solid  state  systems,  and  combinations  of  technologies.  Just  as  for  classical  computers, 
these  tools  will  allow  the  performance  of  realistic  quantum  computing  systems  to  be  evaluated 
and  predicted,  in  advance  of  actual  fabrication,  and  perhaps  even  in  advance  of  technology  devel¬ 
opments,  as  a  strategic  tool  in  directing  investments  in  technology  sectors. 
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Quantum  computation  has  advanced  to  the  point  where  system-level 
solutions  can  help  close  the  gap  between  emerging  quantum  technologies 
and  real-world  computing  requirements. 
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Quantum  computers  offer  the  prospect  of 
computation  that  scales  exponentially 
with  data  size.  Unfortunately,  a  single  bit 
error  can  corrupt  an  exponential  amount 
of  data.  Quantum  mechanics  can  seem 
more  suited  to  science  fiction  than  system  engi¬ 
neering,  yet  small  quantum  devices  of  5  to  7  bits 
have  nevertheless  been  built  in  the  laboratory,1,2 
100-bit  devices  are  on  the  drawing  table  now,  and 
emerging  quantum  technologies  promise  even 
greater  scalability.3,4 

More  importantly,  improvements  in  quantum 
error-correction  codes  have  established  a  threshold 
theorem,5  according  to  which  scalable  quantum 
computers  can  be  built  from  faulty  components  as 
long  as  the  error  probability  for  each  quantum  oper¬ 
ation  is  less  than  some  constant  (estimated  to  be  as 
high  as  10“4).  The  overhead  for  quantum  error  cor¬ 
rection  remains  daunting:  Current  well-known 
codes  require  tens  of  thousands  of  elementary  oper¬ 
ations  to  provide  a  single  fault-tolerant  logical  oper¬ 
ation.  But  proof  of  the  threshold  theorem 
fundamentally  alters  the  prospects  for  quantum 
computers.  No  principle  of  physics  prevents  their 
realization — it  is  an  engineering  problem. 

Empirical  studies  of  practical  quantum  architec¬ 
tures  are  just  beginning  to  appear  in  the  literature.6 
Elementary  architectural  concepts  are  still  lacking: 
How  do  we  provide  quantum  storage,  data  paths, 
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classical  control  circuits,  parallelism,  and  system 
integration?  And,  crucially,  how  can  we  design 
architectures  to  reduce  error-correction  overhead? 

QUANTUM  COMPUTATION 

Quantum  information  systems  can  be  a  mathe¬ 
matically  intense  subject.  We  can  understand  a  great 
deal,  however,  by  using  a  simple  model  of  abstract 
building  blocks:  quantum  bits,  gates,  and  algo¬ 
rithms,  and  the  available  implementation  technolo¬ 
gies — in  all  their  imperfections.7  The  basic  building 
block  is  a  quantum  bit,  or  qubit ,  represented  by 
nanoscale  physical  properties  such  as  nuclear  spin. 
In  contrast  to  classical  computation,  in  which  a  bit 
represents  either  0  or  1,  a  qubit  represents  both 
states  simultaneously.  More  precisely,  a  qubit’s  state 
is  described  by  probability  amplitudes ,  which  can 
destructively  interfere  with  each  other  and  only  turn 
into  probabilities  upon  external  observation. 

Quantum  computers  manipulate  these  ampli¬ 
tudes  directly  to  perform  a  computation.  Because 
n  qubits  represent  2n  states,  a  two-qubit  vector 
simultaneously  represents  the  states  00,  01,10,  and 
1 1 — each  with  some  probability  when  measured. 
Each  additional  qubit  doubles  the  number  of  ampli¬ 
tudes  represented — thus,  the  potential  to  scale 
exponentially  with  data  size. 

A  fundamental  problem,  however,  is  that  we  gen¬ 
erally  cannot  look  at  the  results  of  a  quantum  com- 
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Quantum  Algorithms 

Recent  interest  in  quantum  computers  has  focused  on  Peter  Shor’s  algo¬ 
rithm  for  prime  factorization  of  large  numbers.1  Shor  showed  that  a  quan¬ 
tum  computer  could,  in  theory,  factor  an  n- bit  integer  in  0(n3 )  time. 

Shor’s  discovery  drew  a  lot  of  attention.  The  security  of  many  modern 
cryptosystems  relies  on  the  seeming  intractability  of  factoring  the  prod¬ 
uct  of  two  large  primes,  given  that  the  best-known  factoring  algorithms 
for  a  classical  computer  run  in  exponential  time.  To  put  this  in  perspec¬ 
tive,  researchers  using  the  number  field  sieve  have  successfully  factored  a 
512-bit  product  of  two  primes,  but  it  took  8,400  MIPS  years.2  A  1,024- 
bit  product  would  take  approximately  1.6  billion  times  longer.  That  seems 
intractable. 

With  Shor’s  algorithm,  you  could  factor  a  5 12-bit  product  in  about  3.5 
hours,  assuming  the  quantum  architecture  and  error-correction  schemes 
described  in  this  article,  and  a  1-GHz  clock  rate.  Under  the  same  assump¬ 
tions,  the  algorithm  could  factor  a  1,024-bit  number  in  less  than  31  hours. 

Another  key  algorithm  is  Lov  Grover’s  for  searching  an  unordered  list 
of  n  elements  in  queries.3  Quantum  algorithms  have  also  been  devised 
for  cryptographic  key  distribution4  and  clock  synchronization. 

It  is  expected,  however,  that  a  major  application  area  for  quantum  com¬ 
puters  will  be  the  simulation  of  quantum  mechanical  systems  that  are  too 
complex  to  be  simulated  on  classical  computers.5  This  prospect  opens 
possibilities  impossible  to  imagine  in  the  classical  world  of  our  intuition 
and  current  computers. 
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putation  until  it  ends,  at  which  point  we  get  only  a 
random  value  from  the  vector.  More  precisely,  mea¬ 
suring  a  qubit  vector  collapses  it  into  a  probabilistic 
classical  bit  vector,  yielding  a  single  state  randomly 
selected  from  the  exponential  set  of  possible  states. 
Perhaps  for  this  reason,  quantum  computers  are  best 
at  “promise”  problems — applications  that  use  some 
hidden  structure  in  a  problem  to  find  an  answer  that 
can  be  easily  verified.  Such  is  the  case  for  the  appli¬ 
cation  domains  of  the  two  most  famous  quantum 
algorithms,  Shor’s  for  prime  factorization  of  an  n- 
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bit  integer  in  0(n3)  time8  and  Grover’s  for  search¬ 
ing  an  unordered  ^-element  list  in  queries.9  The 
“Quantum  Algorithms”  sidebar  provides  additional 
information  about  applications  for  these  algorithms. 
Obviously,  designers  of  quantum  algorithms  must 
be  very  clever  about  how  to  get  useful  answers  from 
their  computations. 

Another  problem  is  that  qubits  lose  their  quan¬ 
tum  properties  exponentially  quickly  in  the  pres¬ 
ence  of  a  constant  amount  of  noise  per  qubit.  This 
sensitivity  is  referred  to  as  decoherence ,  and  it  is 
widely  believed  to  be  the  reason  why  the  world 
around  us  is  so  predominantly  classical.  Never¬ 
theless,  quantum  computation  can  tolerate  a  finite 
amount  of  decoherence,  so  the  engineering  prob¬ 
lem  is  to  contain  it  to  a  sufficiently  small  amount. 
The  relevant  measure  is  the  amount  of  decoherence 
per  operation,  p,  which  has  been  estimated  for  a 
wide  range  of  physical  systems.  Specifically,  it  can 
range  from  1CT3  for  electron  charge  states  in  GaAs 
semiconductors,  to  1CT9  for  photons,  10”13  for 
trapped  ions,  and  10”14  for  nuclear  spins.7 

Fiow  realistic  is  quantum  computation  as  a  tech¬ 
nology?  We  cannot  achieve  these  physical  limits  with 
current  technologies,  but  researchers  have  proposed 
concepts  for  realizing  scalable  quantum  computers, 
and  initial  experiments  are  promising.  Nuclear  spins 
manipulated  by  nuclear  magnetic  resonance  (NMR) 
techniques  have  demonstrated  Shor’s  algorithm  with 
seven  qubits.10  In  these  systems,  single-qubit  opera¬ 
tions  take  place  at  about  1  MFiz,  and  two-qubit 
gates  at  about  1  kFiz,  with  an  error  probability  p  ~ 
1CT3.  It  is  believed  that  p  ~  10~6  will  ultimately  be 
possible  for  this  kind  of  device. 

Lower  error  rates  are  expected  to  apply  for 
NMR  systems  that  use  other  techniques,  such  as 
artificial  molecules  synthesized  from  solid-state 
quantum  dots11  or  carefully  placed  phosphorus 
impurities  in  silicon.3  Faster  clock  speeds  of  around 
1  GFiz  should  also  be  possible.  For  scalability  and 
to  take  advantage  of  a  tremendous  historical  invest¬ 
ment  in  silicon  fabrication,  our  architecture 
assumes  a  solid-state  technology  such  as  quantum 
dots  or  phosphorus  atoms.  We  want  to  use  these 
technologies  to  provide  the  building  blocks  for  reli¬ 
able  quantum  computation,  much  as  von  Neumann 
did  for  classical  computation.12 

We’re  a  long  way  from  system-scale  maturity  in 
today’s  quantum  logic  gates,  but  it  was  also  a  long 
way  from  the  initial  silicon  transistors  to  modern 
VLSI.  We  propose  stepping  in  that  direction. 

PROGRAMMING  MODEL 

Given  that  a  technology  solution  is  possible,  how 
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would  we  implement  a  quantum  algorithm? 

Although  some  early  work  was  done  on  quantum 
Turing  machines,13  the  quantum  computation  com¬ 
munity  has  focused  almost  entirely  on  a  circuit 
model14  in  which  algorithms  and  architecture  are 
tightly  integrated — similar  to  a  classical  application- 
specific  integrated  circuit,  or  ASIC.  In  contrast,  our 
goal  is  to  design  a  general-purpose  piece  of  hardware 
that  we  can  program  to  perform  arbitrary  quantum 
computations. 

We  can  express  quantum  algorithms  through  a 
model  that  performs  quantum  operations  on  quan¬ 
tum  data  under  the  control  of  a  classical  computer. 
Accordingly,  quantum  programs  would  combine 
quantum  unitary  transforms  (quantum  gates), 
quantum  measurements,  classical  computation, 
and  classical  control-flow  decisions  into  a  single 
instruction  stream.  A  compiler  (such  as  QCL15) 
then  reads  a  mixed  quantum/classical  language  and 
breaks  down  complex  quantum  operations  into  a 
small  set  of  universal  operators.  The  compiler 
encodes  these  operators  into  a  classical  bit  instruc¬ 
tion  stream  that  also  includes  conventional  proces¬ 
sor  instructions. 

We  anticipate  that  this  compiler  will  have  two 
main  parts:  a  static  precompiler  and  a  dynamic 
compiler.  Both  parts  are  cross-compilers,  running 
on  a  conventional  microprocessor  and  producing 
code  for  our  quantum  architecture. 

The  precompiler  would  generate  code  that  pro¬ 
duces  a  computation  with  a  targeted  end-to-end 
error  probability  on  an  ideal  quantum  computer. 
This  end-to-end  error  means  that  the  generated 
code  must  check  the  answer  and  restart  if  it  is 
wrong.  Similar  to  conventional  VLSI  synthesis 
tools,  the  compiler  employs  a  technology  model, 
but  only  to  the  extent  that  it  specifies  a  universal 
set  of  primitive  operations.  The  compiler  does  not 
need  any  knowledge  of  error  models. 

The  dynamic  compiler  accepts  the  precompiled 
binary  code  and  produces  an  instruction  stream  to 
implement  a  fault-tolerant  computation,  using  the 
minimal  quantum  error  correction  necessary  to 
meet  the  end-to-end  error  rate.  This  compiler  is  also 
given  the  technology  model  and,  importantly,  a 
bound  on  program  execution  time.  Errors  occur  so 
infrequently  in  classical  architectures  that  program 
run  length  is  rarely  an  issue.  In  quantum  architec¬ 
tures,  however,  errors  are  frequent,  and  correction 
incurs  a  polylogarithmic  cost  in  run  length.  Our 
work  on  this  architecture  indicates  that  exploiting 
program  run  length  is  key  to  performance. 

The  bound  on  program  run  length  can  originate 
in  either  a  user  hint  or  dynamic  profiling.  The  hint 


expresses  the  algorithm’s  running  time  given 
some  input  data  size.  To  date,  such  informa¬ 
tion  is  available  for  all  known  quantum  algo¬ 
rithms.  If  the  hint  is  not  available,  the 
compiler  uses  an  adjustable  policy  to  opti¬ 
mize  programs  adaptively.  An  aggressive  pol¬ 
icy  would  start  with  minimal  error  correction 
and  increase  reliability  until  the  program  pro¬ 
duces  the  right  answer;  a  conservative  policy 
would  start  with  extremely  reliable  correc¬ 
tion  and  decrease  reliability  for  future  runs. 

QUANTUM  ERROR  CORRECTION 

The  nonlocalized  properties  of  quantum  states 
means  that  localized  errors  on  a  few  qubits  can 
have  a  global  impact  on  the  exponentially  large 
state  space  of  many  qubits.  This  makes  quantum 
error  correction  perhaps  the  single  most  important 
concept  in  devising  a  quantum  architecture.  Unlike 
classical  systems,  which  can  perform  brute-force, 
signal-level  restoration  error  correction  in  every 
transistor,  quantum  state  error  correction  requires 
a  subtle,  complex  strategy. 

Quantum  difficulties 

The  difficulty  of  error-correcting  quantum  states 
has  two  sources. 

First,  errors  in  quantum  computations  are  dis¬ 
tinctly  different  from  errors  in  classical  computing. 
Despite  the  digital  abstraction  of  qubits  as  two-level 
quantum  systems,  qubit  state  probability  ampli¬ 
tudes  are  parameterized  by  continuous  degrees  of 
freedom  that  the  abstraction  does  not  automati¬ 
cally  protect.  Thus,  errors  can  be  continuous  in 
nature,  and  minor  shifts  in  the  superposition  of  a 
qubit  cannot  be  discriminated  from  the  desired 
computation.  In  contrast,  classical  bits  suffer  only 
digital  errors.  Likewise,  where  classical  bits  suffer 
only  bit-flip  errors,  qubits  suffer  both  bit-flip  and 
phase-flip  errors,  since  their  amplitude  signs  can  be 
either  negative  or  positive. 

The  second  source  of  difficulty  is  that  we  must 
correct  quantum  states  without  measuring  them 
because  measurement  collapses  the  very  superpo¬ 
sitions  we  want  to  preserve. 

Error-correction  code 

Quantum  error-correction  codes  successfully 
address  these  problems  by  using  two  classical  codes 
simultaneously  to  protect  against  both  bit  and  phase 
errors,  while  allowing  measurements  to  determine 
only  information  about  the  error  that  occurred  and 
nothing  about  the  encoded  data.  An  [>z,  k\  code  uses 
n  qubits  to  encode  k  qubits  of  data.  The  encoding 
18 
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Table  1.  Recursive  error-correction  overhead  for  a  single-qubit  operation  using  [7,1]  Steane  correction  code. 

Recursion  level  (k) 

Storage  overhead  7* 

Operation  overhead  153* 

Minimum  time  overhead  5* 

0 

1 

1 

1 

i 

7 

153 

5 

2 

49 

23,409 

25 

3 

343 

3,581,577 

125 

4 

2,401 

547,981,281 

625 

5 

16,807 

83,841,135,993 

3,125 

circuit  takes  the  k  data  qubits  as  input,  together  with 
n-k  ancilla  qubits.  Ancilla  bits  are  extra  “scratch” 
qubits  that  quantum  operations  often  use;  a  spe¬ 
cialized,  entropy  exchange  unit  produces  the  ancilla 
bits  and  “cools”  them  to  an  initial  state  10).  The 
decoder  takes  in  an  encoded  zz-qubit  state  and  out¬ 
puts  k  (possibly  erroneous)  qubits  together  with 
n-k  qubits  that,  with  high  probability,  specify 
which  error  occurred.  A  recovery  circuit  then  per¬ 
forms  one  of  2n~k  operations  to  correct  the  error  on 
the  data. 

This  model  assumes  that  qubit  errors  are  inde¬ 
pendent  and  identically  distributed.  Classical  error 
correction  makes  the  same  assumption,  and  we  can 
adapt  classical  strategies  for  handling  deviations  to 
the  quantum  model. 

Quantum  error  correction  has  a  powerful  and 
subtle  effect.  Without  it,  the  “correctness” — tech¬ 
nically,  the  fidelity — of  a  physical  qubit  decays  expo¬ 
nentially  and  continuously  with  time.  With  it,  the 
exponential  error  model  becomes  linear:  A  logical 
qubit  encoded  in  a  quantum  error-correcting  code 
and  undergoing  periodic  error  measurement  suffers 
only  linear  discrete  amounts  of  error,  to  first  order. 

Not  all  available  codes  are  suitable  for  fault-tol¬ 
erant  computation,  but  the  largest  class — the  sta¬ 
bilizer  codes — support  computation  without 
decoding  the  data  and  thus  propagating  more 
errors  in  the  process.  We  chose  the  [7,1]  Steane  sta¬ 
bilizer  code  for  our  architecture.  It  uses  seven  phys¬ 
ical  qubits  to  encode  one  logical  qubit  and  is  nearly 
optimal  (the  smallest  perfect  quantum  code  is 
[5,1] 16).  The  code  can  perform  an  important  set  of 
single-qubit  operations  as  well  as  the  two-qubit 
controlled-NOT  operator  (used  in  the  architecture’s 
quantum  ALU)  on  the  encoded  qubit  simply  by 
applying  the  operations  to  each  individual  physi¬ 
cal  qubit. 


Error-correction  costs 

The  cost  of  error  correction  is  the  overhead 
needed  to  compute  encoded  states  and  to  perform 
periodic  error-correction  steps.  Each  such  step  is  a 
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fault-tolerant  operation.  The  Steane  code  requires 
approximately  153  physical  gates  to  construct  a 
fault-tolerant  single-qubit  operation. 

Despite  this  substantial  cost,  the  7-qubit  error- 
correcting  code  dramatically  improves  the  quan¬ 
tum  computing  situation.  The  probability  of  a 
logical  qubit  error  occurring  during  a  single  oper¬ 
ation  changes  from  p  to  cp2,  where  c  is  a  constant 
determined  by  the  number  of  places  two  or  more 
failures  can  occur  and  propagate  to  the  next  logi¬ 
cal  qubit,  and  we  want  cp2  <  p. 

For  a  single  logical  gate  application,  c  is  about 
17,446.  For  a  physical  qubit  transform  failure  rate 
of  p  -  1CT6,  this  means  the  7-qubit  Steane  code  has 
a  probable  logical  qubit  transform  failure  rate  of 
1.6  x  10”7  when  a  maximally  parallelized  opera¬ 
tion  uses  an  optimized  error  measurement  proce¬ 
dure.16  Producing  systems  with  a  lower  c  and  more 
reasonable  overheads  requires  a  failure  rate  that  is 
closer  to  1CT9. 

Recursive  error  correction 

The  most  important  application  of  quantum 
codes  to  computation  is  a  recursive  construction,5 
which  exponentially  decreases  error  probabilities 
with  only  polynomial  effort.  This  is  crucial  because 
even  an  error  probability  of  cp 2  is  too  high  for  most 
quantum  applications. 

The  following  example  helps  to  understand  the 
construction:  The  Steane  code  transforms  the  phys¬ 
ical  qubit  error  rate  p  to  a  logical  qubit  error  rate 
cp 2  but  requires  some  number  of  physical  qubit 
gates  per  logical  qubit  gate  operation.  Suppose, 
however,  that  a  logical  gate  on  a  7-qubit  code  again 
implemented  each  of  those  physical  gates.  Each 
gate  would  have  a  logical  gate  accuracy  of  cp2,  and 
the  overall  logical  gate  error  rate  would  become 
c(cp2)2.  For  a  technology  with  p  =  10~6,  the  error 
rate  for  each  upper  level  gate  would  be  roughly 
4.3  x  10”10.  The  key  observation  is  that  as  long  as 
cp2  <  p,  error  probabilities  decrease  exponentially 
with  only  a  polynomial  increase  in  overhead. 

Table  1  summarizes  the  costs  of  recursive  error 
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Figure  1.  Recursion 
level  k  with  a  varied 
problem  size  and 
underlying  qubit 
error  probability  for 
Shor’s  quantum  fac¬ 
torization  algorithm 
(left)  and  Grover's 
quantum  search 
algorithm  (right). 


correction  up  to  five  levels  for  storage,  operation, 
and  minimum  time  overheads.  Clearly,  the  high  cost 
of  recursive  error  correction  means  that  a  quantum 
computer  architecture  should  choose  the  minimum 
recursion  level  for  a  given  algorithm  and  data  size. 

Figure  1  depicts  recursion  level  k  with  a  varied 
problem  size  and  underlying  qubit  error  probabil¬ 
ity  for  both  Shor’s  and  Grover’s  algorithms. 
Increases  in  problem  size  or  error  probability 
require  stronger  error  correction  through  addi¬ 
tional  levels  of  recursion. 

QUANTUM  COMPUTER  ARCHITECTURE 

Building  upon  the  theory  of  fault-tolerant  quan¬ 
tum  computation,  we  define  the  building  blocks  for 
a  general  architecture  that  can  dynamically  mini¬ 
mize  error-correction  overhead.  In  contrast  to  the 
circuit  model  used  in  much  of  the  quantum  com¬ 
puting  literature,  our  architecture  can  efficiently 
support  different  algorithms  and  data  sizes.  The 
key  mechanisms  enabling  this  generalization  are 
reliable  data  paths  and  efficient  quantum  memory. 

In  many  respects,  quantum  computation  is  sim¬ 
ilar  to  classical  computation.  For  example,  quan¬ 
tum  algorithms  have  a  well-defined  control  flow 
that  manipulates  individual  data  items  throughout 
the  execution.  The  physical  restrictions  on  quan¬ 
tum  technologies  also  resemble  the  classical 
domain.  Even  though  two  qubits  can  interact  at  a 
distance,  the  strongest — and  least  error-prone — 
interaction  is  between  near  neighbors.  Further¬ 
more,  controlled  interaction  requires  classical 
support  circuitry,  which  must  be  routed  appropri¬ 
ately  throughout  the  device. 

Although  our  quantum  computer  architecture  is 
similar  to  a  classical  architecture,  certain  aspects  of 


the  computation  are  unique  to  the  quantum 
domain.  As  Figure  2  shows,  the  overall  architec¬ 
ture  has  three  major  components:  the  quantum 
arithmetic  logic  unit  (ALU),  quantum  memory,  and 
a  dynamic  scheduler.  In  addition,  the  architecture 
uses  a  novel  quantum  wiring  technique  that 
exploits  quantum  teleportation.17 

Quantum  ALU 

At  the  core  of  our  architecture  is  the  quantum 
ALU,  which  performs  quantum  operations  for  both 
computation  and  error  correction.  To  efficiently 
perform  any  specified  quantum  gates  on  the  quan¬ 
tum  data,  the  ALU  applies  a  sequence  of  basic 
quantum  transforms  under  classical  control.7  The 
transforms  include 

•  the  Hadamard  (a  radix-2,  1 -qubit  Fourier 
transform), 

•  identity  (I,  a  quantum  NOP), 

•  bit  flip  (X,  a  quantum  NOT), 

•  phase  flip  (Z,  which  changes  the  signs  of  ampli¬ 
tudes), 

•  bit  and  phase  flip  (Y), 

•  rotation  by  n/4  (S), 

•  rotation  by  71/8  (T),  and 

•  controlled  NOT  (CNOT). 

These  gates  form  one  of  the  smallest  possible  uni¬ 
versal  sets  for  quantum  computation.  The  under¬ 
lying  physical  quantum  technology  can  implement 
these  gates  efficiently  on  encoded  data.  All  except 
CNOT  operate  on  only  a  single  qubit;  the  CNOT 
gate  operates  on  two  qubits. 

To  perform  the  high-level  task  of  error  correc¬ 
tion,  the  ALU  applies  a  sequence  of  elementary 
20 
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Figure  2.  Fault- 
tolerant  quantum 
computer  architec¬ 
ture.  The  quantum 
arithmetic  logic  unit 
(ALU)  performs  all 
quantum  operations , 
quantum  memory 
banks  support  effi¬ 
cient  code  conver¬ 
sion ,  teleportation 
transmits  quantum 
states  without  send- 


scheduler  controls 
all  processes. 
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operations.  Because  this  task  is  requisite  to  fault- 
tolerant  quantum  computing,  the  ALU  performs  it 
on  encoded  data  after  most  logical  operations.  This 
procedure  consumes  ancilla  states,  which  help  in 
the  computation  of  parity  checks.  Specialized  hard¬ 
ware  provides  elementary  standard  states  that  the 
ALU  uses  to  manufacture  requisite  ancilla. 


Quantum  memory 

The  architecture’s  generality  relies  on  an  efficient 
quantum  memory.  The  key  is  building  quantum 
memory  banks  that  are  more  reliable  than  quan¬ 
tum  computation  devices.  We  can  also  use  special¬ 
ized  “refresh”  units  that  are  much  less  complex 
than  our  general  ALU. 

The  storage  of  qubits  not  undergoing  computa¬ 
tion  is  very  similar  to  the  storage  of  conventional 
dynamic  RAM.  Just  as  individual  capacitors  used 
for  DRAM  leak  into  the  surrounding  substrate  over 
time,  qubits  couple  to  the  surrounding  environment 
and  decohere  over  time.  This  requires  periodically 
refreshing  individual  logical  qubits.  As  Figure  2 
shows,  each  qubit  memory  bank  has  a  dedicated 
refresh  unit  that  periodically  performs  error  detec¬ 
tion  and  recovery  on  the  logical  qubits.  From  a 
technological  standpoint,  decoherence-free  sub- 
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systems,18  which  naturally  provide  lower  decoher¬ 
ence  rates  for  static  qubits,  could  implement  such 
quantum  memories. 

The  architecture  uses  multiple  quantum  memory 
banks.  This  is  not  for  improving  logical  qubit  access 
times.  In  fact,  the  underlying  error  rate  of  the  qubit’s 
physical  storage  mechanism,  the  algorithm’s  com¬ 
plexity  and  input  data  size,  the  quantum  ALU’s 
operation  time  and  parallelism,  and  the  error-cor¬ 
rection  code  that  stores  the  logical  qubits  limit  the 
bank  size.  For  example,  if  we  run  Shor’s  algorithm 
on  a  1,024-bit  number  using  a  memory  technology 
with  an  error  rate  of  p  -  10~9,  we  estimate  that  it 
would  use  28,000  physical  qubits  to  represent  about 
1,000  physical  bits  using  two  levels  of  recursion  in 
a  5 -qubit  error-correction  code.  On  the  other  hand, 
if  the  error  rate  increases  to  p  -  10-6,  error  correc¬ 
tion  would  require  four  levels  of  recursion  to  refresh 
a  bank  size  of  just  1,000  physical  qubits  that  would 
store  only  two  logical  qubits. 

Quantum  wires 

Moving  information  around  in  a  quantum  com¬ 
puter  is  a  challenge.  Quantum  operations  must  be 
reversible,  and  we  cannot  perfectly  clone  qubits — 
that  is,  we  cannot  copy  their  value.  We  cannot  sim- 
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ply  place  a  qubit  on  a  wire  and  expect  it  to  transmit 
the  qubit’s  state  accordingly.  Instead,  our  architec¬ 
ture  will  use  a  purely  quantum  concept  to  imple¬ 
ment  quantum  wires:  teleportation.17  This  pro¬ 
cedure,  which  has  been  experimentally  demon¬ 
strated,19  transmits  a  quantum  state  between  two 
points  without  actually  sending  any  quantum  data. 
Instead,  with  the  aid  of  a  certain  standard  preshared 
state,  teleportation  sends  two  classical  bits  of  data 
for  each  qubit. 

Teleportation  is  superior  to  other  means  of  deliv¬ 
ering  quantum  states.  Recall  that  a  solid-state  tech¬ 
nology  implements  qubits  with  atoms  implanted  in 
silicon.3,11  The  physical  qubits  cannot  move,  but  we 
can  apply  a  swap  operation  to  progressive  pairs  of 
atoms  to  move  the  qubit  values  along  a  line  of 
atoms.  While  we  could  use  a  series  of  quantum  swap 
gates  to  implement  quantum  wires,  each  swap  gate 
is  composed  of  three  CNOT  gates,  which  introduces 
errors  in  the  physical  qubits — errors  that  generate 
additional  overhead  in  the  correction  procedures. 

Teleportation  instead  uses  quantum  swap  gates 
that  are  not  error-corrected  to  distribute  qubits  in 
a  cat  state  to  the  source  and  destination  of  the  wire. 
A  cat  state  (named  after  Schrodinger’s  cat)  is  a  qubit 
vector  with  probabilities  equally  distributed 
between  all  bits  set  to  1  and  all  bits  set  to  0.  The 
qubits  in  a  cat  state  are  entangled,  and  measuring 
one  of  the  qubits  uniquely  determines  the  state  of 
all  qubits  in  the  qubit  vector.  Teleportation  uses  a 
two-qubit  cat  state. 

This  cat  state  can  be  checked  for  errors  easily  and 
independently  of  the  physical  qubit  being  trans¬ 
mitted.  If  errors  have  overwhelmed  the  cat  state,  it 
can  be  discarded  with  little  harm  to  the  transmis¬ 
sion  process.  Once  a  correct  cat  state  exists  at  both 
ends,  the  cat  state’s  qubits  teleport  the  physical 
qubit  across  the  required  distance. 

Code  teleportation 

Teleportation  can  also  provide  a  general  mecha¬ 
nism  for  simultaneously  performing  quantum  oper¬ 
ations  while  transporting  quantum  data. 
Precomputing  the  desired  operation  on  the  cat 
states  forms  a  kind  of  “quantum  software”  that 
automatically  performs  its  operation  on  the  tele¬ 
ported  data.20  We  can  use  this  mechanism  to  per¬ 
form  an  optimization  by  converting  between 
different  error-correction  codes  during  teleporta¬ 
tion.  Specifically,  we  chose  the  Steane  error-correc¬ 
tion  code  for  its  computational  ease,  not  its 
compactness.  The  quantum  memories,  however, 
perform  only  error  measurement  and  recovery,  not 
computation.  Hence,  they  can  use  a  more  compact 
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code  that  sacrifices  some  ease  of  computation. 

Converting  between  codes  is  usually  an  error- 
prone  process,  but  teleportation  performs  code  con¬ 
version  without  a  single  physical  qubit  error 
compromising  a  complete  logical  qubit  state.20 
Thus,  our  architecture  can  store  the  logical  qubits 
efficiently  in  a  dense  error-correcting  code  if  it  uses 
teleportation  during  transmission  to  the  quantum 
ALU  for  conversion  to  a  less  compact,  but  more 
easily  computable,  error-correction  code. 

From  a  conceptual  standpoint,  this  process  is 
only  a  slight  modification  of  standard  quantum 
teleportation.  As  Figure  3  shows,  specialized  hard¬ 
ware  generates  a  cat  state,  sends  one  qubit  through 
the  encoding  mechanism  for  the  source  error- 
correction  code,  and  sends  the  other  qubit  through 
the  encoder  for  the  destination  error-correction 
code.  The  sender  and  receiver  then  perform  the  log¬ 
ical  qubit  equivalents  of  the  teleportation  opera¬ 
tion  on  each  end  of  the  entangled  pair. 

To  implement  a  more  robust  form  of  this  process, 
the  underlying  architecture  could  use  stabilizer 
measurements  to  generate  the  appropriately 
encoded  cat  states  prior  to  teleportation. 


Figure  3 .  Code  tele¬ 
portation.  In  a  slight 
modification  of 
standard  quantum 
teleportation ,  an 
encoder  at  the  des¬ 
tination  recreates 
quantum  data 
encoded  in  another 
form  at  the  sender. 


Dynamic  scheduler 

The  architecture  uses  a  complete  high-perfor¬ 
mance  classical  processor  for  control.  This  proces¬ 
sor  runs  a  dynamic  scheduling  algorithm  that  takes 
in  logical  quantum  operations,  interleaved  with  clas¬ 
sical  control-flow  constructs,  and  dynamically  trans¬ 
lates  them  into  physical  individual  qubit  operations. 
The  algorithm  uses  knowledge  about  the  overall 
input  data  size  and  physical  qubit  error  rates  to  con¬ 
struct  a  dynamic  schedule  to  control  the  quantum 
ALU,  code  teleportation,  and  qubit  RAM  refresh 
units.  This  is  a  lot  of  work  for  a  single  classical 
processor.  We  expect  significantly  faster  processor 
clock  speeds  to  be  available,  but  it  may  be  necessary 
to  run  multiple  classical  processors  in  parallel. 

The  classical  processor  is  critical  to  making  a 
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Figure  4.  Quantum 
computing 
performance  with 
recursive  error 
correction .  The 
recursive  approach 
to  stronger  error 
corrections  resuits 
in  a  stairstep  curve. 


quantum  architecture  efficient.  We  could  execute 
all  quantum  algorithms  with  the  maximum  avail¬ 
able  error  correction,  but  doing  so  would  be  incred¬ 
ibly  inefficient.  Moreover,  using  dynamic 
compilation  and  knowledge  of  an  algorithm’s  exe¬ 
cution  time  make  several  performance  optimiza¬ 
tions  available  to  the  computation,  including 
application-specific  clustering  prior  to  error  mea¬ 
surement. 

APPLICATION-SPECIFIC  ERROR  OPTIMIZATION 

While  theoretically  possible,  quantum  error  cor¬ 
rection  introduces  overheads  yet  unheard  of  in  the 
classical  domain.  A  single  level  of  error  correction 
incurs  an  overhead  of  at  least  153  quantum  gates 
per  logical  operation  in  our  architecture;  a  k  level 
recursive  scheme  has  a  factor  of  153k  overhead. 

The  scheduling  unit  ultimately  implements  mech¬ 
anisms  to  control  this  overhead  dynamically  at  exe¬ 
cution  time.  This  unit  compiles  the  quantum 
software  instructions  (that  operate  on  logical 
qubits)  into  the  specific  quantum  operations 
required  for  execution  on  the  physical  qubits  of  the 
error-correction  codes  used  throughout  the  archi¬ 
tecture.  Furthermore,  the  unit  dynamically  sched¬ 
ules  the  quantum  operations  to  intermix  classical 
control-flow  constructs  with  the  quantum  opera¬ 
tions,  while  fully  utilizing  the  available  quantum 
ALU  functional  units. 

Figure  4  abstractly  depicts  the  effects  of  recur¬ 
sive  error  correction  on  execution  time.  As  appli¬ 
cation  data  size  increases,  so  must  the  recursive 
structure,  but  the  recursion  increases  occur  at  inte¬ 
gral  steps.  Using  the  classical  processor  for  just-in- 
time  quantum  software  compilation,  we  customize 
the  error  correction  to  the  algorithm  and  data  size. 
This  customization  aggregates  the  cost  of  error-cor¬ 
rection  processes  over  several  operations,  thereby 
making  the  integral  cost  more  continuous. 
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Our  architecture  achieves  system-level  efficien¬ 
cies  through  code  teleportation,  quantum 
memory  refresh  units,  dynamic  compilation 
of  quantum  programs,  and  scalable  error  correc¬ 
tion.  Our  work  indicates  that  reliability  of  the 
underlying  technology  is  crucial;  practical  archi¬ 
tectures  will  require  quantum  technologies  with 
error  rates  between  10”6  and  10”9. 

In  addition  to  the  underlying  technology,  the  sig¬ 
nificant  overhead  of  quantum  error  correction 
remains  the  most  pressing  quantum  computing 
architectural  issue.  The  clustering  solution  we  pro¬ 
pose  can  regain  some  of  the  performance  lost  from 
recursive  error  correction,  but  the  gains  are  limited 
to  the  cost  of  only  a  single  recursion  layer.  Further 
reductions  will  require  other  new  techniques. 
Quantum  theorists  are  working  on  new  correction 
codes  with  attractive  properties.  Some  can  correct 
for  more  than  a  single  error  or  condense  more  than 
one  logical  qubit  together  to  increase  density. 

The  key  to  exploiting  these  algorithmic  devel¬ 
opments  in  a  quantum  architecture  is  to  identify 
the  basic  building  blocks  from  which  a  design 
methodology  can  grow.  We  hope  to  lay  the  foun¬ 
dation  for  a  science  of  quantum  CAD  for  the  reli¬ 
able  quantum  computers  of  the  future.  ■ 
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Compilers  and  computer-aided  design  tools  are  essential  for  fine-grained  control  of 
nanoscale  quantum-mechanical  systems.  A  proposed  four-phase  design  flow  assists  with 
computations  by  transforming  a  quantum  algorithm  from  a  high-level  language  program 
into  precisely  scheduled  physical  actions. 


Quantum  computers  have  the  potential  to  solve 
certain  computational  problems — for  example, 
factoring  composite  numbers  or  comparing  an 
unknown  image  against  a  large  database — 
more  efficiently  than  modern  computers.  They 
are  also  useful  in  controlling  quantum-mechan¬ 
ical  systems  in  emergent  nanotechnology  applications, 
such  as  secure  optical  communication,  in  which  mod¬ 
ern  computers  cannot  natively  operate  on  quantum  data. 

Despite  convincing  laboratory  demonstrations  of 
quantum  information  processing,  as  the  “Ongoing 
Research  in  Quantum  Computing”  sidebar  describes,  it 
remains  difficult  to  scale  because  it  relies  on  inherently 
noisy  components.  Adequate  use  of  quantum  error  cor¬ 
rection  and  fault  tolerance  theoretically  should  enable 
much  better  scaling,  but  the  sheer  complexity  of  the  tech¬ 
niques  involved  limits  what  is  doable  today.  Large  quan¬ 
tum  computations  must  also  achieve  a  high  degree  of 
parallelism  to  complete  before  quantum  states  decohere. 

As  candidate  quantum  technologies  mature,  the  fea¬ 
sibility  of  quantum  computation  will  increasingly 
depend  on  software  tools,  especially  compilers,  that 
translate  quantum  algorithms  into  low-level,  technol¬ 
ogy-specific  instructions  and  circuits  with  added  fault 
tolerance  and  sufficient  parallelism. 

We  propose  a  layered  software  architecture  consist¬ 
ing  of  a  four-phase  computer-aided  design  flow  that 


assists  with  such  computations  by  mapping  a  high-level 
language  source  program  representing  a  quantum  algo¬ 
rithm  onto  a  quantum  device.  By  weighing  different 
optimization  and  error-correction  procedures  at  appro¬ 
priate  phases  of  the  design  flow,  researchers,  algorithm 
designers,  and  tool  builders  can  trade  off  performance 
and  accuracy. 


QUANTUM  COMPUTATION 

The  quantum  circuit ,a  a  commonly  used  computa¬ 
tion  model  similar  to  a  modern  digital  circuit,  provides 
a  representation  of  a  quantum  algorithm.  Digital  cir¬ 
cuits  capture  both  mathematical  algorithms,  such  as 
for  sorting  and  searching,  and  methods  for  real-world 
control  and  measurement,  as  in  cellular  phones  and 
automobiles.  Quantum  circuits  likewise  describe  meth¬ 
ods  for  control  of  quantum  systems,  such  as  atomic 
clocks  and  optical  communication  links,  that  cannot 
be  fully  controlled  with  conventional  binary  digital  cir¬ 
cuits  alone. 

A  quantum  circuit  consists  of  quantum  bits  (qubits), 
quantum  gates,  quantum  wires,  and  qubit  measure¬ 
ments.  A  qubit  is  analogous  to  a  classical  bit  but  can  be 


in  a  wave-like  sup& 

0  and  1,  written  a  0  )+  b 
numbers.  Mathematical 


rtiosilwi 

')+b  l) 


n  of  the  symbolic  bit  values 
where  a  and  b  are  complex 
y,  a  qubit  can  be  written  as  a 


vector  of  complex  numbers.  When  measured,  a  qubit 
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Ongoing  Research  in  Quantum  Computing 


Researchers  in  industry  and  government  labs  are  exploring 
various  aspects  of  quantum  design  and  automation  with  a  wide 
range  of  applications.  In  addition  to  the  examples  described 
below,  universities  in  the  US,  Canada,  Europe, Japan,  and  China 
are  carrying  out  much  broader  efforts. 

BBN  Technologies 

Based  in  Cambridge,  Massachusetts,  BBN  Technologies 
(www.bbn.com)  developed  the  world’s  first  quantum  key  dis¬ 
tribution  (QKD)  network  with  funding  from  the  US  Defense 
Advanced  Research  Projects  Agency.The  fiber-optical  DARPA 
Quantum  Network  offers  24x7  quantum  cryptography  to 
secure  standard  Internet  traffic  such  as  Web  browsing^-com- 
mercejand  streaming  video. 

D-Wave  Systems 

Located  in  Vancouver,  British  Columbia,  Canada,  D-Wave 
Systems  (www.dwavesys.com)  builds  superconductor-based 
software-programmable  custom  integrated  circuits  for  quan¬ 
tum  optimization  algorithms  and  quantum-physical  simulations. 
These  ICs  form  the  heart  of  a  quantum  computing  system 
designed  to  deliver  massively  more  powerful  and  faster  perfor¬ 
mance  for  cryptanalysis,  logistics,  bioinformatics,  and  other  appli¬ 
cations. 

Hewlett-Packard 

The  Quantum  Science  Research  Group  at  HP  Labs  in  Palo 
Alto,  California,  is  exploring  nanoscale  quantum  optics  for  infor¬ 
mation-processing  applications  (www.hpl.hp.com/research/qsr). 
In  addition,  the  Quantum  Information  Processing  Group  at  the 
company’s  research  facility  in  Bristol,  UK,  is  studying  quantum 
computation,  cryptography,  and  teleportation  and  communica¬ 
tion  (www.hpl.hp.com/research/qip). 

Hypres 

Located  in  Elmsford,  New  York,  Hypres  Inc.  (www. 
hypres.com)  is  the  leading  developer  of  superconducting  digital 
circuits  for  wireless  and  optical  communication.  Based  on  rapid 
single-flux  quantum  logic,  these  circuits  have  achieved  gate 
speeds  up  to  770  GHz  in  the  laboratory. 

IBM  Research 

Scientists  at  IBM’s  Almaden  Research  Center  in  California  and 
the  T.J.  Watson  Research  Center’s  Yorktown  office  in  New  York 
developed  a  nuclear  magnetic  resonance  (NMR)  quantum  com¬ 
puter  that  factored  15  into  3  X5  (http://archives.cnn.com/ 
2000/TEC H/computing/08/ 1 5/quantum.reut).  Researchers  at  the 
Watson  facility  and  the  Zurich  Research  Lab  are  also  developing 
Josephson  junction  quantum  devices  (www.research.ibm.com/ 
ss_computing)  as  well  as  studying  quantum  information  theory 
(www.research.ibm.com/quantuminfo). 


Id  Quantique 

Based  in  Geneva,  Switzerland,  id  Quantique  (www. 
idquantique.com)  is  a  leading  provider  of  quantum  cryptogra¬ 
phy  solutions,  including  wire-speed  link  encryptors,  QKD  appli¬ 
ances^  turnkey  service  for  securing  communication  transfers, 
and  quantum  random  number  generators.The  company’s  opti¬ 
cal  instrumentation  product  portfolio  includes  single-photon 
counters  and  short-pulse  laser  sources. 

Los  Alamos  National  Lab 

The  Los  Alamos  National  Lab  (http://qso.lanl.gov/qc)  in  New 
Mexico  is  studying  quantum-optical  long-distance  secure  com¬ 
munications  and  QKD  for  satellite  communications.  It  has  also 
conducted  groundbreaking  work  on  quantum  error  correction, 
decoherence,  quantum  teleportation,  and  the  adaptation  of 
NMR  technology  to  quantum  information  processing. 

MagiQ  Technologies 

MagiQ  Technologies  (www.magiqtech.com),  headquartered 
in  New  York  City,  launched  the  world’s  first  commercial  quan¬ 
tum  cryptography  device  in  2003.  MagiQ  Quantum  Private 
Network  systems  incorporate  QKD  over  metro-area  fiber¬ 
optic  links  to  protect  against  both  cryptographic  deciphering 
and  industrial  espionage. 

NEC  Labs 

Scientists  at  NEC’s  Fundamental  and  Environmental  Research 
Laboratories  in  Japan,  in  collaboration  with  the  Riken  Institute 
of  Physical  and  Chemical  Research,  have  demonstrated  a  basic 
quantum  circuit  in  a  solid-state  quantum  device  (www.labs.nec. 
co.jp/Eng/innovative/E3/top.html).  Recently,  NEC  researchers 
have  also  been  involved  in  realizing  the  fastest  fortnight-long, 
continuous  quantum  cryptography  final-key  generation. 

NIST 

The  Quantum  Information  Program  at  the  US  National  Institute 
of  Standards  andTechnology  (http://qubit.nist.gov)  is  building  a  pro¬ 
totype  I  O-qubit  quantum  processor  as  a  proof-in-principle  of  quan¬ 
tum  information  processing.  Potential  applications  include 
ultraprecise  measurement  (atomic  clocks,  optical  metrology,  and 
so  on),  control  of  dynamic  processes,  and  nanotechnology. 
Researchers  at  the  program’s  facilities  in  Boulder,  Colorado,  and 
Gaithersburg,  Maryland,  are  also  optimizing  the  speed  of  free- 
space  quantum  cryptography  systems. 

NTT  Basic  Research  Labs 

NTT’s  Superconducting  Quantum  Physics  Research  Group 
in  Japan  focuses  on  the  development  of  quantum  cryptography 
protocols  (www.brl.ntt.co.jp/group/shitsuryo-g/qc).  In  particu¬ 
lar,  they  have  exhibited  quantum  cryptography  using  a  single 
photon  realized  in  a  photonic  network  of  optical  fibers. 
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Figure  1.  Proposed  design  flow.  The  first  three  phases  are  part  of  the  quantum  computer 
compiler >  while  the  last  phase  implements  the  quantum  algorithm  on  a  quantum  device  or 
simulator. 


rhe  value  0  or  the  value  1,  with  proba- 
> 2,  respectively. 

An  w-qubit  quantum  state  is  written  as  a  vector  rep¬ 
resenting  a  superposition  of  2n  different  bit  strings.  The 
state  remains  in  a  superposition  for  the  computation’s 
duration,  and  the  final  sequence  of  measurements  col¬ 
lapses  the  state  onto  the  bit  string  that  gives  the  result  of 
the  computation.  This  result  will  not  be  affected  if  all 
bit  strings  in  a  given  state  are  multiplied  by  a  constant, 
called  a  global  phase ,  before  measurement.  However, 
the  ratios  of  coefficients  of  different  bit  strings  are  sig¬ 
nificant  and  determine  relative  phases. 

A  quantum  gate  is  a  reversible  transformation  of  a 
quantum  state  that  preserves  total  probability — for 
example,  for  a  single  qubit  a  2  +  b  2  =  1 .  Quantum  gates 
are  represented  by  unitary  matrices  that  act  on  quan¬ 
tum  state  vectors  by  left  multiplication.  Gates  are  con¬ 
nected  by  quantum  wires  that  transport  qubits  forward 
in  time  or  space.  Quantum  wires  cannot  fan  out — that 
is,  qubits  with  unknown  state  cannot  be  duplicated. 
Matrix  multiplication  models  composition  of  gates  in 
series;  the  Kronecker,  or  tensor,  product  models  com¬ 
position  of  gates  in  parallel. 

Inaccurate  gates  and  uncontrolled  environmental  cou¬ 
plings  introduce  data  errors.  Uncontrolled  coupling  results 
in  decoherence,  which  causes  qubits  to  collapse  to  states 
that  behave  probabilistically,  like  (possibly  biased)  classi¬ 
cal  coins.  Such  states  have  no  phase  information  and  can¬ 
not  perform  quantum  computation.  These  effects  compli¬ 
cate  quantum  information  processing,  but  researchers  can 
address  them  using  tools  that  perform  optimizations  and 
automatically  add  error  correction. 

FOUR-PHASE  DESIGN  FLOW 

We  envision  a  hierarchy  of  design  tools  with  simple 
interfaces  between  layers  that  include  programming  lan¬ 
guages,  compilers,  optimizers,  simulators,  and  layout 
tools.  Such  an  architecture  appears  necessary  because 
no  single  entity  can  afford  the  huge  investments  required 
to  develop  all  necessary  tools.  To  this  end,  open  source 
software  encourages  wider  community  participation. 
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A  sufficiently  transparent 
architecture  facilitates  tool  inter¬ 
operability,  focused  point-tool 
development,  and  incremental 
improvements.  Quantum  algo¬ 
rithm  designers  and  those  devel¬ 
oping  quantum  circuit  optimi¬ 
zations  can  explore  new  algo¬ 
rithms  and  error-correction  pro¬ 
cedures  in  more  realistic  settings 
involving  actual  noise  and  phys¬ 
ical  resource  constraints.  Re¬ 
searchers  can  also  simulate  im¬ 
portant  quantum  algorithms  on 
proposed  new  technologies  be¬ 
fore  doing  expensive  lab  experiments. 

Our  four-phase  design  flow,  shown  in  Figure  1,  maps 
a  high-level  program  representing  a  quantum  algorithm 
into  a  low-level  set  of  machine  instructions  to  be  imple¬ 
mented  on  a  physical  device.  The  high-level  quantum 
programming  language  encapsulates  the  mathematical 
abstractions  of  quantum  mechanics  and  linear  algebra.1 
The  design  flow’s  first  three  phases  are  part  of  the  quan¬ 
tum  computer  compiler  (QCC).  The  last  phase  imple¬ 
ments  the  algorithm  on  a  quantum  device  or  simulator. 

In  addition  to  providing  support  for  the  abstractions 
used  to  specify  quantum  algorithms,  the  programming 
languages  and  compilers  at  the  top  level  of  our  tool  suite 
accommodate  optimization  improvements  as  our  under¬ 
standing  of  new  quantum  technologies  matures.  The 
simulation  and  layout  tools  at  the  bottom  level  incor¬ 
porate  details  of  the  emerging  quantum  technologies 
that  would  ultimately  implement  the  algorithms 
described  in  the  high-level  language.  The  tools  balance 
tradeoffs  involving  performance,  qubit  minimization, 
and  fault-tolerant  implementations. 

The  representations  of  the  quantum  algorithm 
between  the  phases  are  the  key  to  an  interoperable  tools 
hierarchy.  In  the  first  phase,  the  compiler  front  end  maps 
a  high-level  specification  of  a  quantum  algorithm  into  a 
quantum  intermediate  representation  (QIR) — a  quan¬ 
tum  circuit  with  gates  drawn  from  some  universal  set. 
Compared  to  traditional  logic  circuits,  quantum  circuits 
are  more  structured  and  typically  have  intrinsic  sequen¬ 
tial  semantics,  wherein  gates  modify  globally  maintained 
state  qubits  in  parallel. 

In  the  second  phase,  a  technology-independent  opti¬ 
mizer  maps  the  QIR  into  an  equivalent  lower-level  cir¬ 
cuit  representation  of  single-qubit  and  controlled-NOT 
(CNOT)  gates.  The  compiler  optimizes  this  Quantum 
Assembly  Language  (QASM)  according  to  a  cost  func¬ 
tion  such  as  circuit  size,  circuit  depth,  or  accuracy.  Since 
limiting  quantum  computing  to  a  fixed  set  of  registers 
and  fixed  word  size  would  significantly  restrict  its 
power,  QASM  does  not  have  such  limitations,  unlike 
traditional  assembly  languages.  Therefore,  parallelism 
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has  a  greater  impact  and  must  be  extracted  by  the 
compiler. 

The  third  phase  consists  of  optimizations  suited  to  the 
quantum  computing  technology  and  outputs  Quantum 
Physical  Operations  Language  (QPOL),  a  physical-lan¬ 
guage  representation  with  technology-specific  parameters. 
QPOL  includes  two  subphases:  The  first  maps  the  repre¬ 
sentation  of  single-qubit  and  CNOT  gates  into  a  QASM 
representation  using  a  fault-tolerant  discrete  universal  set 
of  gates;  the  second  maps  these  gates  into  a  QPOL  repre¬ 
sentation  containing  the  physical  instructions  for  the  fault- 
tolerant  operations  scheduled  in  parallel,  including  the 
required  movements  of  physical  par¬ 
ticles.  Knowledge  of  the  physical  lay¬ 
out  and  architectural  limitations 
enters  no  later  than  at  this  step. 

The  final  phase  utilizes  technology- 
dependent  tools  such  as  layout  mod¬ 
ules,  circuit  and  physical  simulators, 
or  interfaces  to  actual  quantum 
devices.  If  at  this  point  certain  tech¬ 
nology  constraints  or  objectives  have 
not  been  met,  algorithm  and  device  designers  can  repeat 
some  earlier  phases.  In  addition,  it  is  possible  to  add  fault 
tolerance  and  error  correction  at  multiple  phases  of  the 
design  process. 

The  “Sample  Design  Flow:  EPR  Pair  Creation”  side- 
bar  provides  a  concrete  example  of  how  our  proposed 
design  flow  automates  the  process  of  transforming 
mathematical  models  into  software  for  controlling  a  live 
quantum-mechanical  system. 

PROGRAMMING  ENVIRONMENT 
AND  LANGUAGE 

Designing  a  quantum  programming  environment  is 
difficult  given  the  currently  limited  repertoire  of  quan¬ 
tum  algorithms.  However,  this  situation  is  likely  to 
improve  as  the  demand  for  nanoscale  control  increases. 
The  programming  model  is  also  uncertain  because 
researchers  can  design  a  quantum  computer  as  either  an 
application-specific  integrated  circuit  or  a  general-pur¬ 
pose  processor.  However,  it  is  safe  to  assume  that  clas¬ 
sical  computers  will  monitor  quantum  devices  through 
a  bidirectional  communication  link.2 

A  quantum  programming  environment  should  pos¬ 
sess  several  key  characteristics.2  First,  it  needs  a  high- 
level  quantum  programming  language  that  offers  the 
necessary  abstractions  to  perform  useful  quantum  oper¬ 
ations.  It  should  support  complex  numbers,  quantum 
unitary  transforms  (quantum  gates),  and  measurements 
as  well  as  classical  pre-  and  postprocessing.  Support  for 
reusable  subroutines  and  gate  libraries  is  also  required. 
However,  the  exact  modularization  of  a  quantum  pro¬ 
gramming  environment  remains  an  open  question. 

In  addition,  the  environment  as  well  as  the  program¬ 
ming  language  should  be  based  on  familiar  concepts  and 


constructs.  This  would  make  learning  how  to  write, 
debug,  and  run  a  quantum  program  easier  than  using  a 
totally  new  environment. 

The  quantum  programming  environment  also  should 
allow  easy  separation  of  classical  and  quantum  compu¬ 
tations.  Because  a  quantum  computer  has  noise  and  lim¬ 
ited  coherence  time,  this  separation  can  limit  computa¬ 
tion  time  on  the  quantum  device.  The  compiler  for  a 
quantum  programming  language  should  be  able  to  trans¬ 
late  a  source  program  into  an  efficient  and  robust  quan¬ 
tum  circuit  or  physical  implementation;  it  should  be  easy 
to  translate  into  different  gate  sets  or  optimize  with 
respect  to  a  desired  cost  function. 

Further,  the  high-level  program¬ 
ming  language  should  be  hardware- 
independent  and  compile  onto  dif¬ 
ferent  quantum  technologies.  How¬ 
ever,  the  language  and  environment 
should  allow  the  inclusion  of  tech¬ 
nology-specific  modules. 

A  language  that  supports  high- 
level  abstractions  would  facilitate 
development  of  new  quantum  algorithms  and  applica¬ 
tions.  Researchers  have  proposed  many  quantum  pro¬ 
gramming  languages  based  on  the  quantum  circuit 
model,2’3  but  a  language  that  provides  further  insights 
on  quantum  information  processing  is  needed.  We  also 
seek  a  language  that  simplifies  creation  of  robust,  opti¬ 
mized  target  programs. 

QUANTUM  COMPUTER  COMPILER 

A  generic  compiler  for  a  classical  language  on  a  clas¬ 
sical  machine  consists  of  a  sequence  of  phases  that 
transform  the  source  program  from  one  representation 
into  another.4  This  partitioning  of  the  compilation 
process  has  led  to  the  development  of  efficient  algo¬ 
rithms  and  tools  for  each  phase.  Because  the  front-end 
processes  for  QCCs  are  similar  to  those  of  classical 
compilers,  researchers  can  use  the  algorithms  and  tools 
to  build  lexical,  syntactic,  and  semantic  analyzers  for 
QCCs.  However,  the  intermediate  representations,  the 
optimization  phase,  and  the  code-generation  phase  of 
QCCs  differ  greatly  from  classical  compilers  and 
require  novel  approaches,  such  as  a  way  to  insert  error- 
correction  operations  into  the  target  language  program. 

Quantum  intermediate  representation 

Other  popular  quantum  computation  models,  such  as 
adiabatic  quantum  computing,  can  be  converted  to 
quantum  circuits.  Therefore,  in  our  design  flow’s  first 
phase,  the  QCC’s  front  end  maps  a  high-level  specifica¬ 
tion  of  a  quantum  algorithm  into  a  QIR  based  on  the 
quantum  circuit  model.1 

Provisions  must  be  made  in  the  QIR  for  classical  and 
quantum  control  flows  as  well  as  data  flows.  In  partic¬ 
ular,  quantum-to-classical  conversions  are  accomplished 
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via  quantum  measurements,  while  quantum  condition¬ 
als  and  entangled  switch  statements  are  implemented 
using  quantum  multiplexer  gates.5  High-level  optimiza¬ 
tions  may  involve  simultaneous  changes  to  quantum  and 
classical  control  flows  and  to  data  flows.  We  also  con¬ 
sider  fault-tolerant  constructions  at  various  phases  in 
the  design  flow  and  incorporate  circuit  synthesis  and 
optimization  techniques  in  both  the  technology-inde¬ 
pendent  and  technology-dependent  phases. 

Circuit  synthesis  and  optimization 

During  the  second  and  third  phases,  the  QCC  syn¬ 
thesizes  and  optimizes  a  QASM  representation  of  a 
quantum  circuit  using  procedures  similar  to  those  cur¬ 
rently  used  for  digital  circuits.  Algorithms  for  classical 
logic  circuit  synthesis  map  a  Boolean  function  into  a  cir¬ 
cuit  using  gates  from  a  given  gate  library.  Similarly, 
quantum  circuit  synthesis  creates  a  circuit  that  performs 
a  given  unitary  transform  up  to  an  irrelevant  global 
phase  or  a  prescribed  quantum  measurement. 

A  digital  logic  designer  can  immediately  construct  a 


two-level  circuit  of  a  Boolean  function,  linear  in  the  size 
of  the  function’s  truth  table,  and  then  use  various  tech¬ 
niques  to  optimize  it.  In  contrast,  finding  a  good  quan¬ 
tum  circuit  to  implement  a  2n  X2W  unitary  matrix  is 
difficult.  Only  very  recently  have  constructive  algorithms 
become  available  that  yield  an  asymptotically  optimal 
circuit  with  0( 4n)  gates.  Because  CNOT  gates  are  typi¬ 
cally  most  expensive,  their  counts  have  been  pushed 
down  to  only  a  factor  of  two  away  from  lower  bounds.5 
Remaining  gates  operate  on  single  qubits  at  a  time,  but 
unlike  CNOT  gates  their  functionality  can  be  tuned 
using  continuous  parameters. 

When  developing  reusable  software  for  automating 
quantum  circuit  design,  reducing  technological  depen¬ 
dence  is  desirable.  Today,  the  NAND  gate  is  easier  to 
implement  than  the  AND  gate  in  CMOS-based  inte¬ 
grated  circuits.  Commercial  circuit  synthesis  tools 
address  this  by  decoupling  libraryless  logic  synthesis  from 
technology  mapping.  The  former  step  uses  an  abstract 
gate  library,  such  as  AND-OR-NOT,  and  emphasizes  the 
scalability  of  synthesis  algorithms  that  capture  the  given 
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Accurately  capturing  quantum-mechanical  systems  using  tra¬ 
ditional  Os  and  Is  is  inherently  difficult.  Quantum  information 
must  therefore  be  processed  directly — without  converting  it 
to  bits — during  state  transformation  and  teleportation,  com¬ 
munication,  measurements,  and  other  common  tasks. 

Figure  A  illustrates  how  our  proposed  four-phase  design  flow 
automates  the  transformation  of  mathematical  models  into  soft¬ 
ware  for  controlling  a  live  physical  system. 

An  algorithm  designer,  researcher,  or  engineer  initially 
expresses  a  mathematical  specification  of  a  quantum  algorithm 
in  a  high-level  quantum  programming  language,  automatically 
creating  a  quantum  circuit  that  encapsulates  the  mathematical 
abstractions  of  quantum  mechanics  and  linear  algebra. 

In  the  design  flow’s  first  phase,  the  quantum  computer  com¬ 
piler  abstracts  the  quantum  circuit  as  a  quantum  intermediate 
representation  (QIR).  Next,  the  QCC  translates  the  circuit 
into  Quantum  Assembly  Language  (QASM)  that  captures  a  uni¬ 
versal  set  of  quantum  gates.  In  the  third  phase,  the  QCC  trans¬ 
lates  QASM  instructions  into  Quantum  Physical  Operations 
Language  using  software  tools.  QPOL  has  knowledge  of  par¬ 
ticulars  of  the  quantum  device,  including  layout  and  a  technol¬ 
ogy-specific  gate  library.  Finally,  technology-dependent  software 
tools  translate  QPOL  into  machine  instructions. 

In  this  example,  we  demonstrate  how  to  produce  Einstein- 
Podolsky-Rosen  (EPR)  pairs1  for  implementation  on  atrapped- 
ion  computer. Trapped-ion  systems  have  shown  considerable 
potential  as  a  future  quantum  computing  technology.2 These 
computers  use  charged,  electromagnetically  trapped  atoms  as 
qubit  carriers  and  the  internal  state  of  single  ionized  atoms  as 
qubits.  Ions  can  be  shuttled  in  and  out  of  ion  traps  to  increase 


the  quantum  computer’s  effective  size. 

An  important  physical  resource  for  quantum  computing  and 
communication,  EPR  pairs  are  entangled  quantum  states  that  can¬ 
not  be  decomposed  into  (tensor  products  of)  single-qubit  states. 
They  represent  quantum  non  locality  and  have  applications  in  quan¬ 
tum  state  teleportation,  ultraprecise  measurement, and  lithogra¬ 
phy  as  well  as  in  a  number  of  quantum  computing  algorithms. 

For  EPR  pair  creation,  we  abstract  the  mathematical  repre¬ 
sentation  in  a  quantum  circuit  composed  of  a  Hadamard  (H)  and 
CNOT  gate.The  figure  shows  sample  QASM  and  QPOL  repre¬ 
sentations.  Determining  the  phase  in  which  to  insert  fault  toler¬ 
ance  and  error  correction  is  an  open  research  question;  here  we 
show  how  to  replace  a  CNOT  gate  with  a  circuit  for  a  fault-tol¬ 
erant  encoded  CNOT  operation  limited  to  local  interactions. 

QPOL  instructions  for  creating  an  EPR  pair  can  be  translated 
into  a  sequence  of  laser  pulses — in  this  case,  for  performing  a 
CNOT  gate  on  an  ion-trap  device.The  machine  instructions  are 
as  follows: 

1.  Alternately  raise  and  lower  the  potentials  of  electrodes  A, 
1 , 2,  and  3  to  move  ions  from  trap  A  to  trap  B. 

2.  Apply  a  laser  to  the  “green”  ion  to  cool  the  ion  chain  that 
may  have  heated  during  movement. 

3.  Apply  JTpulse  on  the  first  red  sideband  of  the  x  ion. 

4.  Apply  pulse  on  carrier  of  the  y  ion. 

5.  Apply  Jlpulse  on  the  first  red  sideband  of  the  x  ion. 

6.  Split  the  “green”  ion  and  the  x  ion  away  from  the  y  ion  and 
move  them  back  to  trap  A. 

The  six-step  process  could  take  around  10-100  [LA 


30 


78 


Computer 


computation’s  global  structure.  The  latter  step  converts 
all  gates  of  a  logic  circuit  to  gates  from  a  technology-spe¬ 
cific  gate  library,  often  supplied  by  a  chip  manufacturer, 
and  is  based  on  local  optimizations. 

We  expect  the  distinction  between  technology-inde¬ 
pendent  circuit  synthesis  and  technology  mapping  to 
carry  over  to  quantum  circuits.6  This  is  precisely  why 
the  QCC  maps  the  quantum  algorithm  into  a  QASM 
representation  consisting  of  single-qubit  and  CNOT 
gates  in  the  second  phase  of  our  design  flow. 

In  addition,  temporary  decompositions  into  elemen¬ 
tary  gates  could  help  optimize  pulse  sequences  and 
reduce  systematic  inaccuracies  in  physical  implementa¬ 
tions.  For  example,  a  CNOT  gate  can  be  mapped  onto 
a  specific  technology  by  appropriately  timing  pulses  that 
couple  two  qubits,  with  pre-  and  postprocessing  by  less 
sophisticated  pulses  that  affect  single  qubits.6 

Technology-mapped  circuits  could  potentially  be  opti¬ 
mized  further  via  automatic  instantiation  of  error  cor¬ 
rection,  efficient  handling  of  universal  gate  libraries 
without  tunable  gates,  and  identification  of  reusable 


quantum  logic  blocks  and  their  efficient  implementation. 

Quantum  Assembly  Language 

During  the  technology-independent  phase  of  our 
design  flow,  the  QCC  maps  a  representation  of  the  quan¬ 
tum  algorithm  into  an  equivalent  set  of  Quantum 
Assembly  Language  instructions.  QASM  is  a  classical 
reduced-instruction-set  computing  assembly  language 
extended  by  a  set  of  quantum  instructions  based  on  the 
quantum  circuit  model.  It  uses  qubits  and  registers  of 
classical  bits  (cbits)  as  static  units  of  information  that 
must  be  declared  at  the  program’s  beginning.  Quantum 
instructions  in  QASM  consist  solely  of  single-qubit  uni¬ 
tary  gates,  CNOT  gates,  and  measurements.  Any  quan¬ 
tum  circuit  can  be  constructed  using  these  instructions. 

Quantum  Physical  Operations  Language 

QPOL  precisely  describes  the  execution  of  a  given 
quantum  algorithm  expressed  as  a  QASM  program  on 
a  particular  technology,  like  trapped-ion  systems.  QPOL 
includes  physical  operations  as  well  as  technology- 
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Figure  A.  Using  quantum  information  processing  to  control  live  physical  systems.  Proposed  four-phase  design  flow,  detailed  for 
EPR  pair  creation  on  a  trapped-ion  computer  with  machine  instructions  translated  into  a  sequence  of  laser  pulses  that  perform  a 
CNOT  gate.  A  feedback  loop  allows  for  repetition  of  earlier  phases. 
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Figure  2.  Trapped-ion  simulator.  Graphical  display  shows  an 
H-tree  layout  Qubits  are  electronic  states  of  ions,  represented 
by  spheres,  and  gates  are  laser  pulses,  represented  by  lines. 

The  qubits  can  move  within  the  black  regions  but  not  into  the 
substrate,  drawn  using  light  squares. 

specific  modules.  In  particular,  it  organizes  physical 
operations  into  five  instruction  types: 

•  Initialization  instructions  specify  how  to  prepare  the 
initial  system  state.  This  can  include  loading  qubits 
into  the  quantum  computer,  initializing  auxiliary 
physical  states  used  in  computations,  and  setting 
qubits  to  b  ) 

•  Computation  instructions  include  quantum  gates  and 
measurements. 

•  Movement  instructions  control  the  relative  distance 
between  qubits  to  bring  them  together  to  undergo 
simultaneous  operations  or  move  them  apart. 

•  Classical  control  instructions  provide  simple  logic 
operations  and  allow  quantum  gates  to  be  applied 
based  on  classical  bit  values  stored  in  classical 
memory. 

•  System-specific  instructions  control  physical  para¬ 
meters  of  the  system  that  do  not  explicitly  fall  into 
the  other  categories. 

The  final  QPOL  distributes  these  instructions  to  the  avail¬ 
able  instruction  processing  units — highly  parallel  quan¬ 
tum  computers  will  have  many — and  by  inserting 
appropriate  waiting  times. 

In  the  case  of  trapped-ion  computers,  initialization 
has  three  stages:  loading  of  multiple  ions  into  a  loading 
region,  laser  cooling  to  reduce  ion  temperatures,  and 
optical  pumping  to  put  all  qubits  into  a  known  state. 

Computation  is  naturally  described  in  terms  of  single¬ 
qubit  rotations  and  a  controlled-phase  gate  between  ions 
in  the  same  trap,  both  achieved  using  a  laser  pulse 
sequence.  Measurement  uses  another  laser  pulse  that 
causes  ions  in  the  0  /state  to  fluoresce.  Electrostatic  fields 
can  move  ions  between  multiplexed  traps,  and  they  can 
move  multiple  ions  in  and  out  of  the  same  trap. 


An  external  classical  processor  controls  the  execution 
of  QPOL  instructions,  stores  measurement  results,  and 
performs  conditional  instructions  based  on  stored  cbits. 

System-specific  instructions  recool  ions  when  they 
heat  due  to  movement  operations.  Certain  laser  pulses 
also  accomplish  recooling,  but  the  lasers  are  applied  dif¬ 
ferently  for  cooling  than  for  gates,  requiring  different 
programming  and  pulse-sequence  optimization. 

HIGH-PERFORMANCE  SIMULATION 
OF  QUANTUM  CIRCUITS 

Quantum-mechanical  effects  are  useful  for  accelerat¬ 
ing  certain  classical  computations,  as  Lov  Grover7  and 
Peter  Shor8  have  shown;  however,  numerical  simulation 
of  quantum  computers  on  classical  computers  remains 
important  for  engineering  reasons. 

In  classical  electronic  design  automation,  chip  design¬ 
ers  always  test  independent  modules  and  complete  sys¬ 
tems  by  simulating  them  on  test  vectors  before  costly 
manufacturing.  Numerical  simulations  can  also  help  to 
evaluate  quantum  heuristics  that  defy  formal  worst-case 
analysis  or  only  work  well  for  a  fraction  of  inputs. 

For  the  numerical  simulation  phase  of  our  design  flow, 
we  again  use  the  quantum  circuit  formalism.  Because 
mathematical  models  of  quantum  states,  quantum  gates, 
and  measurement  involve  linear  algebra,  a  key  aspect  of 
efficient  simulation  is  exploiting  the  structure  in  the  matri¬ 
ces  and  vectors  derived  from  quantum  circuits.  To  this 
end,  researchers  have  proposed  polynomial-time  simula¬ 
tion  techniques  for  circuits  arising  in  error  correction9 
and  for  “slightly  entangled”  quantum  computation. 

QuIDDPro:  A  generic  graph-based  simulator 

George  Viamontes  and  colleagues10  have  proposed  a 
generic  simulation  technique  based  on  data  compres¬ 
sion  using  the  quantum  information  decision  diagram 
(QuIDD)  data  structure.  Its  worst-case  performance  is 
no  better  than  what  can  be  achieved  with  basic  linear 
algebra,  but  it  can  dramatically  compress  structured  vec¬ 
tors  and  matrices,  including  all  basis  states,  small  gates, 
and  some  tensor  products. 

A  QuIDD  is  a  directed  acyclic  graph  with  one  source 
and  multiple  sinks,  each  labeled  with  a  complex  num¬ 
ber.  The  graph  models  matrix  and  vector  elements  as 
directed  paths;  any  given  vector  or  matrix  can  be  encoded 
as  a  QuIDD  and  vice  versa.  Graph  algorithms  working 
on  QuIDDs,  supplied  as  a  software  library,  implement 
all  linear-algebraic  operations  in  terms  of  compressed 
data  representations. 

Time  and  memory  used  by  these  algorithms  to  simu¬ 
late  a  useful  class  of  quantum  circuits  scale  polynomially 
with  the  number  of  qubits.  All  components  of  Grover’s 
algorithm,  except  for  some  application-dependent  ora¬ 
cles,  fall  into  this  class.  QuIDD-based  simulation  of  the 
algorithm  requires  time  and  memory  resources  that  are 
polynomial  in  the  oracle  function’s  size.  If  a  compact 
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QuIDD  can  represent  a  particular  oracle  function  for 
some  search  problem,  then  classical  simulation  of  the 
algorithm  runs  nearly  as  fast  as  an  ideal  quantum  circuit. 

QuIDDs  can  also  simulate  density  matrices  by  imple¬ 
menting  several  additional  operations,  such  as  trace- 
overs,  in  terms  of  graph  traversals.10  Straightforward 
modeling  of  any  16-qubit  density  matrix  would  require 
64  Tbytes  of  memory.  In  contrast,  for  a  reversible  16- 
qubit  adder  circuit  using  CNOT  and  Toffoli  gates,  the 
QuIDDPro  package  (http://vlsicad.eecs.umich.edu/ 
quantum/qp)  requires  less  than  5  Mbytes. 

Trapped-ion  simulator 

Numerical  simulations  of  quantum  systems  are  also 
useful  when  studying  the  feasibility  or  performance  of 
specific  physical  implementations.9  We  have  carried  out 
such  a  simulation  for  trapped-ion  systems  with  up  to 
1,000  qubits;  this  applies  to  quantum  stabilizer  circuits , 
which  are  central  to  quantum  error  correction. 

The  keys  to  such  realistic  simulations  are  the  layout  of 
qubits  in  physical  space  and  the  scheduling  of  opera¬ 
tions.  Our  layout  tool  maps  circuits  onto  an  H-tree ,  a 
recursively  constructed  fractal  layout.  This  reduces 
movement  operations  required  per  gate  by  keeping 
qubits  in  inner  codes  near  one  another  within  concate¬ 
nated  quantum  codes,  which  also  have  a  self-similar 
structure.  Our  scheduler  tool  uses  implicitly  specified 
paths  to  optimize  for  minimal  distances,  expanding 
QASM  instructions  to  include  movements. 

The  simulator  output  includes  the  final  quantum  state 
(for  circuit  verification),  measurement  and  failure  his¬ 
tories,  total  execution  time,  and,  in  the  case  of  a  fault- 
tolerant  circuit,  validity  of  the  final  output.  As  Figure  2 
shows,  output  also  is  a  graphical  display  of  QPOL 
instructions  as  they  are  simulated. 

DESIGN  FLOW  FOR  FAULT-TOLERANT 
ARCHITECTURES 

The  inherently  noisy  nature  of  quantum  computers 
requires  inserting  error-correction  routines  and  replac¬ 
ing  gates  with  their  fault-tolerant  implementations  to 
achieve  scalability.  A  system  architect  can  apply  this 
process  manually,  synthesizing  and  laying  out  each  fault- 
tolerant  gate  ( architecture-driven  design ),  or  a  compiler 
can  apply  it  algorithmically  ( software-driven  design). 

We  are  currently  considering  both  processes  for 
trapped-ion  computing  systems,  but  the  principles 
extend  to  other  physical  systems.  The  central  goal  of 
both  designs  is  to  guarantee  that  the  final  sequence  of 
physical  operations  will  execute  fault-tolerantly  on  the 
target  system — if  failures  occur  infrequently  enough, 
then  the  resulting  errors  cannot  cause  the  system  to  fail. 

Fault-tolerant  classical  components 

In  special  applications  of  modern  digital  computers, 
the  canonical  method  for  fault-tolerant  computation  is 


Figure  3.  TMR  fault-tolerant  NAND  gate  at  the  second  level  of 
recursion ,  constructed  from  three  fault-tolerant  NAND  (N) 
gates  and  three  majority  (M)  gates. 


triple  modular  redundancy.11  TMR  involves  feeding  gate 
inputs  copied  three  times  into  three  gates  that  fail  with 
probability  O(p).  The  output  lines  of  these  faulty  gates 
fan  out  into  three  majority  voting  gates.  The  majority 
gates  essentially  amplify  the  correct  value  of  the  com¬ 
putation  so  that  the  fault-tolerant  gate  fails  only  if  two 
or  more  failures  occur.  Mathematically,  the  fault-toler¬ 
ant  gate  fails  with  probability  0(p2). 

Figure  3  shows  a  TMR  fault-tolerant  NAND  gate  at 
the  second  level  of  recursion,  constructed  from  three 
fault-tolerant  NAND  gates  and  three  majority  gates.  All 
gates  are  assumed  to  fail  with  probability  p,  such  that  the 
highlighted  TMR  NAND  gate  fails  with  probability 
<  6p2,  ignoring  input  errors.  The  entire  circuit  shown 
fails  with  probability  <  63p4.  If  p  <  1/6,  then  this  circuit 
is  more  reliable  than  a  basic  gate. 

Applying  TMR  recursively  k  times,  as  illustrated  in 
Figure  3  for  k  =  2,  fault-tolerant  components  can  be 
made  to  fail  with  probability  bounded  above  by  pfk)  = 
(cp)2k/c.  The  constant  c  is  determined  by  the  maximum 
number  of  fault  paths  through  the  highlighted  circuit 
that  lead  the  circuit  to  fail.  In  this  case,  c  =  6  because  at 
least  two  gates  or  two  majority  voters  must  fail.  If  each 
basic  gate  fails  with  probability  p  <  1/c,  then  pfk)  ->  0 
as  k  oo.  This  construction  exhibits  a  fault-tolerance 
threshold  pth  =  He. 

Fault-tolerant  quantum  components 

We  construct  fault-tolerant  quantum  components  using 
procedures  similar  to  classical  fault-tolerance  techniques. 
They  can  encode  quantum  information  using  quantum 
computation  codes12  that  allow  fault-tolerant  computa¬ 
tion  via  a  discrete  universal  set  of  gates.  Calderbank-Shor- 
Steane  codes  are  one  family  of  quantum  codes  that  allow 
a  transversal  implementation  of  an  encoded  CNOT  gate. 
Transversal  gates  are  always  fault  tolerant  because  they 
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Figure  4.  Fault-tolerant  quantum  computation,  (a)  Recovery  operation,  (b)  Single 
syndrome  bit  extraction. 


are  implemented  in  a  bitwise  fashion — a  gate  between  a 
pair  of  encoded  qubits  is  implemented  by  applying  the  gate 
from  bit  1  of  the  first  encoded  qubit  to  bit  1  of  the  second 
encoded  qubit,  and  so  on. 

However,  there  are  no  known  computation  codes  for 
which  a  universal  set  of  encoded  operations  can  be  imple¬ 
mented  transversally.  In  practice,  performing  quantum 
gates  requires  fault-tolerant  preparation  of  several  kinds 
of  ancillas,  or  scratch  qubits.  After  each  gate,  we  insert  on 
each  qubit  a  recovery  operation  that  consumes  a  syndrome 
extraction  ancilla  to  acquire  syndrome  bits.  Syndrome 
extraction  ancillas  must  be  available  in  great  supply  and 
may  need  to  be  checked  for  critical  errors  using  verifica¬ 
tion  ancillas.  All  of  these  operations  must  remain  fault  tol¬ 
erant  when  qubits  can  only  interact  locally.13 

Figure  4  illustrates  key  aspects  of  this  process.  A  recov¬ 
ery  operation,  shown  in  Figure  4a,  interacts  fault-toler- 
antly  with  the  data  via  syndrome  bit  extraction  networks 
Sl5S2,  ...  Sm.  This  involves  using  a  syndrome  extraction 
ancilla  (aua2,  ...)  to  measure  each  syndrome  bit,  possi¬ 
bly  several  times,  and  storing  the  results  to  a  classical 
register.  A  classical  computer  processes  the  register  and 
applies  the  appropriate  error  correction  R  to  the  data. 
Recovery  operations  follow  every  fault-tolerant  gate  to 
correct  errors  potentially  introduced  by  that  gate. 

As  Figure  4b  shows,  extracting  a  single  syndrome  bit 
fault-tolerantly  first  requires  an  ancilla  state.  The  high¬ 
lighted  network  prepares  (P)and  verifies  (V)  the  ancilla; 
a  verification  qubit  indicates  if  the  ancilla  failed  the  ver¬ 
ification  network  V.  Upon  successful  preparation  of  an 
ancilla,  the  C  network  interacts  with  the  data  fault-tol¬ 
erantly  to  collect  a  syndrome  bit.  The  quantum  network 
D  then  decodes  and  measures  the  bit.  Some  classical 
postprocessing  may  take  the  place  of  D. 

Fault-tolerant  architectures 

A  quantum  computation  code  conceptually  separates 
the  logical  and  physical  machine.  Both  architecture-dri¬ 
ven  and  software-driven  designs  exploit  this  fact  to  yield 
two  different  processes  within  the  framework  of  our 
design  flow. 

An  architecture-driven  design  process  inserts  fault-tol¬ 
erant  gates  from  a  predesigned  library  during  technol¬ 
ogy-dependent  code  generation.  A  design  team  creates 
the  library  of  universal,  fault-tolerant,  technology- 


specific  components  using  a  combi¬ 
nation  of  replacement  rules,  heuris¬ 
tic  methods,  and  device  models,  then 
publishes  the  library  together  with 
design  rules  for  connecting  the  com¬ 
posite  components. 

A  software-driven  design  process 
inserts  fault-tolerant  gates  during 
technology-independent  code  gener¬ 
ation  using  replacement  rules  based 
on  quantum  circuits.  Sophisticated 
schedulers  and  layout  tools  insert  QPOL  instructions  to 
preserve  fault  tolerance.  Algorithmic  optimizations 
make  fine-grained  replacements,  and  compilers  can  use 
feedback  from  simulators  to  focus  the  optimizers  on  the 
circuit’s  critical  regions.  Our  software  architecture 
allows  such  insertion  and  testing  of  error-correction  and 
fault-tolerance  techniques  at  multiple  stages  in  the 
design  flow. 

Our  work  has  thus  far  focused  on  the  languages, 
transformations,  and  fault-tolerance  procedures 
needed  along  the  design  flow  to  produce  robust 
implementations.  However,  many  important  challenges 
remain  to  be  solved  before  researchers  can  build  or  even 
realistically  design  a  scalable  quantum  computer. 

To  effectively  use  available  quantum  resources,  we 
must  be  able  to  schedule  and  synchronize  parallel  quan¬ 
tum  computations.  We  also  need  efficient  technology- 
independent  optimization  algorithms  for  realistic  classes 
of  quantum  circuits  as  well  as  strategies  for  adapting 
generic  circuits  to  specific  architectural  constraints  and 
implementation  technologies. 

Identifying  and  evaluating  meaningful  architectural 
design  blocks  will  necessitate  further  development  of 
simulation  techniques  for  quantum  circuits  and  high- 
level  programs. 

Achieving  robust,  scalable  quantum  computation  will 
require  both  fault-tolerant  architectural  strategies  com¬ 
patible  with  emerging  quantum  device  technologies  and 
optimization  algorithms  that  minimize  the  number  of 
fault  paths,  code  size,  or  number  of  gates  in  fault-toler¬ 
ant  circuits. 

It  will  also  be  necessary  to  match  tools  to  experimen¬ 
tal  implementations  as  well  as  develop  methodologies 
for  design  verification  and  test  such  as  quantum  state 
tomography,  circuit-equivalence  checking,  and  test- vec¬ 
tor  generation. 

The  grandest  challenge  of  all  is  to  design  a  high-level 
programming  language  that  encapsulates  the  principles 
of  quantum  mechanics  in  a  natural  way  so  that  physicists 
and  programmers  can  develop  and  evaluate  more  quan¬ 
tum  algorithms. 

Design  and  verification  tools  for  robust  quantum 
circuits  are  vital  to  the  future  of  quantum  informa- 
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tion  processing  systems,  and  their  development  will 
be  a  natural  evolutionary  step  as  such  machines  grad¬ 
uate  from  the  laboratory  to  engineering  design. 
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Abstract 

As  quantum  computing  moves  closer  to  reality  the  need  for 
basic  architectural  studies  becomes  more  pressing.  Quan¬ 
tum  wires,  which  transport  quantum  data,  will  be  a  fun¬ 
damental  component  in  all  anticipated  silicon  quantum  ar¬ 
chitectures.  In  this  paper,  we  introduce  a  quantum  wire  ar¬ 
chitecture  based  upon  quantum  teleportation.  We  compare 
this  teleportation  channel  with  the  traditional  approach  to 
transporting  quantum  data,  which  we  refer  to  as  the  swap¬ 
ping  channel.  We  characterize  the  latency  and  bandwidth 
of  these  two  alternatives  in  a  device-independent  way  and 
describe  how  the  advanced  architecture  of  the  teleporta¬ 
tion  channel  overcomes  a  basic  limit  to  the  maximum  com¬ 
munication  distance  of  the  swapping  channel.  In  addition, 
we  discover  a  fundamental  tension  between  the  scale  of 
quantum  effects  and  the  scale  of  the  classical  logic  needed 
to  control  them.  This  “pitch-matching  ”  problem  imposes 
constraints  on  minimum  wire  lengths  and  wire  intersec¬ 
tions,  which  in  turn  imply  a  sparsely  connected  architec¬ 
ture  of  coarse-grained  quantum  computational  elements. 
This  is  in  direct  contrast  to  the  “sea  of  gates  ”  architectures 
presently  assumed  by  most  quantum  computing  studies. 


1  Introduction 

Many  important  problems  seem  to  require  exponential  re¬ 
sources  on  a  classical  computer.  Quantum  computers  can 
solve  some  of  these  problems  with  polynomial  resources, 
leading  a  great  number  of  researchers  to  explore  quantum 
information  processing  technologies  [28,  31,  13,  15,  41,  9, 
17,  42].  Early-stage  quantum  computers  have  involved  a 
small  number  of  components  (less  than  10)  and  have  uti¬ 
lized  molecules  in  solution  and  trapped  ions  [47,  25,  35]. 
To  exploit  our  tremendous  historical  investment  in  silicon, 
however,  solid-state  silicon  quantum  computers  are  desir¬ 
able.  Promising  proposals  along  these  lines  have  begun 
to  appear  [22,  50];  these  even  include  ideas  that  merge 
atomic  physics  and  silicon  micromachining [24].  However, 
as  the  number  of  components  grows,  quantum  computing 
systems  will  begin  to  require  the  same  level  of  engineer¬ 
ing  as  current  computing  systems.  The  same  process  we  as 
computer  architects  do  for  classical  silicon-based  systems, 
of  building  abstractions  and  optimizing  structure,  needs  to 
be  applied  to  quantum  technologies. 


Even  at  this  early  stage,  a  general  architectural  study  of 
quantum  computation  is  important.  By  investigating  the 
potential  costs  and  fundamental  challenges  of  quantum  de¬ 
vices,  we  can  help  illuminate  previously  unforeseen  obsta¬ 
cles  of  constructing  a  scalable  quantum  processor.  We  may 
also  anticipate  and  specify  important  subsystems  and  tech¬ 
niques  common  to  all  implementations.  Identifying  these 
practical  challenges  early  will  help  focus  the  ongoing  de¬ 
velopment  of  fabrication  and  device  technology.  Develop¬ 
ing  abstractions  for  quantum  technology  and  basic  archi¬ 
tectural  concepts  for  it  has  proven  to  be  quite  fascinating. 

This  paper  is  about  a  seemingly  mundane  subject:  a 
wire.  To  be  clear,  we  define  a  wire  in  the  quantum  world 
as  a  mechanism  for  moving  quantum  data  from  one  spa¬ 
tial  location  to  another.  Any  optimistic  view  of  the  future 
of  quantum  computing  includes  enough  interacting  devices 
to  introduce  a  spatial  extent  to  the  layout  of  those  devices. 
This  spatial  dimension,  in  turn,  introduces  a  need  for  wires. 
As  we  will  show,  a  quantum  wire  is  a  very  different  crea¬ 
ture  from  a  classical  one.  One  of  the  most  important  dis¬ 
tinctions  between  quantum  and  classical  wires  arises  from 
the  fact  that  quantum  information  (composed  of  quantum 
bits  or  qubits)  cannot  be  copied  [31].  Instead,  it  must  be 
transported  from  source  to  destination  -  destroying  the  in¬ 
formation  at  the  source  and  re-creating  it  at  the  destination. 
This  fact  changes  our  normal  intuitions  about  the  use  of 
buffers  to  drive  wires,  repeaters  to  amplify  signals,  and  fan¬ 
out  to  distribute  information.  In  particular,  all  wires  must 
be  point-to-point  and  can  only  protect  information  rather 
that  amplifying  it. 

Quantum  information  can  be  encoded  in  a  number  of 
ways,  such  as  the  spin  component  of  basic  particles  like 
protons  or  electrons,  or  in  the  polarization  of  photons. 
Thus,  there  are  several  ways  in  which  we  might  transfer 
information.  First,  we  might  physically  transport  particles 
from  one  point  to  another.  In  a  large  solid-state  system,  the 
logical  candidate  for  information  carriers  would  be  elec¬ 
trons,  since  they  are  highly  mobile.  Unfortunately,  elec¬ 
trons  are  also  highly  interactive  with  the  environment  and 
hence  subject  to  corruption  of  their  quantum  state,  a  pro¬ 
cess  known  as  decoherence.  Second,  we  might  consider 
passing  information  along  a  line  of  quantum  devices.  This 
swapping  channel  is,  in  fact,  a  viable  option  for  short  dis¬ 
tances  (as  discussed  in  Section  4),  but  tends  to  accumu¬ 
late  errors  over  long  distances.  In  some  ways,  this  solution 
resembles  a  quantum-cellular  automata  (QCA)  [32]  wire, 
except  without  duplication  of  data  capabilities. 
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Over  longer  distances,  we  need  something  fundamen¬ 
tally  different.  We  propose  to  use  a  technique  called  tele¬ 
portation  [7]  and  to  call  the  resulting  long-distance  quan¬ 
tum  wire  a  teleportation  channel  to  distinguish  from  a 
swapping  channel.  Teleportation  uses  an  unusual  quantum 
property  called  entanglement ,  which  allows  quantum  bits 
to  interact  instantaneously  at  a  distance1.  To  understand 
the  mathematical  details  and  practical  implications  of  tele¬ 
portation,  we  will  need  to  cover  some  background  and  prior 
art  before  returning  to  the  subject  in  Section  2.3. 

In  the  remainder  of  this  paper,  we  will  quantify  the  ad¬ 
vantages  and  disadvantages  of  swapping  channels  versus 
teleportation  channels.  Realistic  concerns  such  as  quan¬ 
tum  error-correction  [43]  for  protecting  information  data 
errors  and  entropy  exchange  [37]  for  generating  zeros  and 
entangled  pairs,  greatly  complicate  things.  Also  important 
is  an  often-neglected  facet  of  quantum  computing  systems 
—  the  fact  that  they  depend  upon  classical  signals  for  con¬ 
trol  of  quantum  operations.  We  will  explore  the  fundamen¬ 
tal  tension  between  the  scale  at  which  quantum  effects  oc¬ 
cur  and  the  scale  at  which  classical  signals  can  be  reliably 
routed.  The  architectural  implications  of  this  tension  man¬ 
ifest  themselves  as  a  pitch-matching  problem. 

Overall,  the  contributions  of  this  research  are: 

•  We  define  the  basic  building  blocks  required  to  con¬ 
struct  long  and  short  quantum  wires. 

•  We  discover  that  the  interface  between  classical  con¬ 
trol  and  quantum  devices  requires  minimum  wire 
lengths  between  fanout  sites.  We  generalize  these  lim¬ 
itations  in  terms  of  the  ratio  of  quantum  and  classical 
devices  in  a  given  technology  and  discuss  the  archi¬ 
tectural  implications  of  these  limitations. 

•  We  find  that  the  latency  and  bandwidth  of  swapping 
channels  are  extremely  sensitive  to  the  length  of  the 
channel,  but  that  teleportation  channels  do  not  exhibit 
the  same  sensitivity. 

The  remainder  of  this  paper  continues  with  a  brief  in¬ 
troduction  to  quantum  computing  in  Sections  2  and  3.  Sec¬ 
tion  4  introduces  the  swapping  channels  that  can  be  con¬ 
structed  from  solid-state  technologies  and  presents  an  anal¬ 
ysis  of  the  scalability  problems  with  these  channels.  Sec¬ 
tion  5  presents  teleportation  channels,  our  architectural  so¬ 
lution  to  scalable  quantum  data  transport.  Section  6  dis¬ 
cusses  our  future  work  in  system  bandwidth  issues  and  in 
Section  7  we  conclude. 

2  Quantum  Computing 

We  begin  with  a  brief  overview  of  the  basic  terminology 
and  constructs  of  quantum  computation.  Our  purpose  is  to 

Although  this  property  sounds  suspiciously  like  “faster-than-light” 
communication,  we  shall  see  that  the  interaction  is  ambiguous  without  the 
additional  transmission  of  two  bits  of  classical  information,  which  must 
travel  at  a  subluminal  velocity. 


introduce  the  language  necessary  for  subsequent  sections; 
in-depth  treatments  of  these  subjects  are  available  in  the 
literature  [31]. 

2.1  Quantum  states:  qubits 

The  state  of  a  classical  digital  system  A  can  be  specified 
by  a  binary  string  x  composed  of  a  number  of  bits  ,tz,  each 
of  which  uniquely  characterizes  one  elementary  piece  of 
the  system.  For  n  bits,  there  are  2n  unique  possible  states. 
The  state  of  an  analogous  quantum  system  ^  is  described 
by  a  complex-valued  vector  \ip)  =  x  cx\x),  a  weighted 
combination  (a  “superposition”)  of  the  basis  vectors  |x), 
where  the  probability  amplitudes  cx  are  complex  numbers 
whose  modulus  squared  sums  to  one,  i.e.  Y2X  \°x\ 2  =  1* 

A  single  quantum  bit  is  commonly  referred  to  as  a  qubit 
and  is  described  by  the  equation  \ip)  =  co|0)  +  ci|l). 
Such  a  qubit  might  be  represented,  for  example,  by  the 
nuclear  spin  of  an  atom.  Legal  qubit  states  include  pure 
states,  such  as  |0)  and  1 1) ,  and  states  in  superposition, 
such  as  ^=|0)  +  ^=|1).  Also  valid  are  ^=(|0)  —  |1))  and 

^(|0)  -M|l)),  which  are  other  equal  superpositions,  but 
with  different  relative  phases  between  the  basis  states. 

Larger  quantum  systems  can  be  composed  from  multi¬ 
ple  qubits.  For  example,  |00)  is  a  valid  two-qubit  state,  and 
so  is  ^|00)  +  ^|01)  — ^|11).  An  n-qubit  state  is  described 
by  2n  basis  vectors,  each  with  its  own  complex  probability 
amplitude,  so  an  n-qubit  system  can  exist  in  an  arbitrary 
superposition  of  the  possible  2n  classical  states  of  the  sys¬ 
tem.  To  compose  multiple  independent  quantum  systems 
together,  the  tensor  product  operator  0  is  used,  e.g.,  a  0  b. 

Unlike  the  classical  case,  however,  where  the  total  can 
be  completely  characterized  by  its  parts,  the  state  of  larger 
quantum  systems  cannot  be  described  simply  by  giving  the 
individual  states  of  its  component  qubits.  This  property, 
known  as  entanglement ,  is  best  illustrated  with  an  exam¬ 
ple:  there  exist  no  single  qubit  states  | i/ja)  and  \^b)  such 
that  the  two-qubit  state  |\h)  =  -^=|00)  +  ^=|11)  can  be  ex¬ 
pressed  as  the  composite  state  \^a)  0  | ^b)-  Entanglement 
does  not  exist  classically,  and  the  unique  properties  of  en¬ 
tangled  states  are  widely  believed  to  be  at  the  heart  of  what 
gives  quantum  computers  their  computational  powers. 

Another  non-intuitive  property  of  quantum  states  is  their 
behavior  when  measured.  Upon  observation,  a  quantum 
state  collapses  into  one  of  a  number  of  possible  classi¬ 
cal  states,  the  set  of  possibilities  being  determined  by  the 
measurement  apparatus.  Specifically,  it  is  conventional  (in 
the  quantum  computation  and  quantum  information  com¬ 
munity)  to  adopt  the  computational  basis  states  |0  . . .  00), 
|0  . . .  01),  |0  . . .  10),  . . .,  |1 . . .  11),  and  choose  measure¬ 
ments  to  collapse  states  into  this  basis.  The  probability  that 
a  particular  basis  state  x  results  is  |c^  |2,  the  modulus  square 
of  the  probability  amplitude  for  the  basis  vector  x.  For  ex¬ 
ample,  when  ^=(|0)+z|l))is  measured,  the  outcome  is  |0) 
or  |1)  with  equal  probability.  Similarly,  when  the  state  |4/), 
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Figure  1.  Basic  Quantum  Gates  and  their  matrix  representations. 


above,  is  measured,  the  result  is  either  |00)  or  1 11),  with 
equal  probability;  the  outcomes  1 01)  or  1 10)  never  occur. 

Due  to  the  probabilistic  nature  of  measurement,  design¬ 
ers  of  quantum  algorithms  must  be  very  clever  about  how  to 
get  useful  answers  out  of  their  computations.  One  method 
is  to  iteratively  skew  probability  amplitudes  in  a  qubit  vec¬ 
tor  until  the  desired  value  is  near  |1)  and  the  other  values 
are  close  to  |0).  This  technique  is  used  in  Grover’s  algo¬ 
rithm  for  searching  an  unordered  list  of  n  elements  [18]. 
The  algorithm  goes  through  y/n  iterations,  at  which  point 
a  qubit  vector  representing  the  keys  can  be  measured.  The 
desired  element  is  found  with  high  probability. 

Another  option  in  a  quantum  algorithm  is  to  arrange  the 
computation  such  that  it  does  not  matter  which  of  many 
random  results  is  measured  from  a  qubit  vector.  This 
method  is  used  in  Shor’s  algorithm  for  factoring  the  prod¬ 
uct  of  two  large  primes  [40],  which  is  built  upon  the  quan¬ 
tum  Fourier  transform,  an  exponentially  fast  version  of  the 
classical  discrete  Fourier  transform.  Essentially,  the  factor¬ 
ization  is  encoded  within  the  period  of  a  set  of  highly  prob¬ 
able  values,  from  which  the  desired  result  can  be  obtained 
no  matter  what  value  is  measured.  Since  the  tractability 
of  factoring  the  product  of  two  large  primes  is  the  ba¬ 
sis  of  nearly  all  public-key  cryptographic  security  systems, 
Shor’s  algorithm  has  received  much  attention. 

For  the  interested  reader,  quantum  algorithms  for  a  vari¬ 
ety  of  problems  other  than  search  and  factoring  have  been 
developed:  adiabatic  solution  of  optimization  problems 
(the  quantum  analogue  of  simulated  annealing)  [11],  pre¬ 
cise  clock  synchronization  (using  EPR  pairs  to  synchronize 
GPS  satellites)  [21,  12],  quantum  key  distribution  (prov- 
ably  secure  distribution  of  classical  cryptographic  keys) 
[6],  and  very  recently,  Gauss  sums  [46],  testing  of  matrix 
multiplication  (in  0(n1,75)  steps  versus  the  0(n2)  required 
classically)  [20],  and  Pell’s  equation  [19]. 


2.2  Quantum  gates  and  circuits 

Just  as  bits  can  be  flipped  using  a  NOT  gate,  and  interact 
with  each  other  via  multi-bit  logic  gates  such  as  the  XOR, 
qubits  can  be  operated  on  by  gates  such  as  those  shown  in 
Figure  1 .  In  the  quantum  realm,  the  role  of  the  classical 
truth  table  is  played  by  a  unitary  operator  U.  The  output 
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Figure  2.  Quantum  Teleportation  of  state  | a)  over  dis¬ 
tance.  First,  entangled  qubits  |  b)  and  |  c)  are  exchanged. 
Then,  |  a)  is  combined  with  |  b)  after  which  measure¬ 
ments  produce  two  classical  bits  of  information  (double 
lines).  After  transport,  these  bits  are  used  to  manipu¬ 
late  |  c)  to  regenerate  state  \a)  at  destination. 

state  vector  is  the  operator  applied  to  the  input  vector;  that 
is,  IVWt)  =  U\^pin).  The  X  gate  is  analogous  to  the  clas¬ 
sical  NOT  gate:  it  flips  |0)  and  |1).  The  Z  gate  is  some¬ 
thing  new  to  the  quantum  realm:  it  flips  the  phase  of  the 
|1)  state,  thus  exchanging  ^(|0)  +  |1))  and  ^(|0)  -  |1)). 
The  Hadamard  gate  H  is  another  unusual  single-qubit  gate: 
it  turns  |0)  into  ^ (|0)  +  |1»  and  |1)  into  ^=(|0)  -  |1)); 
it  can  be  thought  of  as  performing  a  radix-2  Fourier  trans¬ 
form.  Another  important  single-qubit  gate,  T,  leaves  |0) 
unchanged  but  multiplies  1 1)  by  Vi.  And  analogous  to 
the  classical  XOR  gate  is  the  quantum  controlled-NOT  (or 
CNOT)  gate. 

Together,  these  gates  form  a  universal  set :  just  as  any 
Boolean  circuit  can  be  composed  from  AND  and  NOT  gates, 
any  polynomially  describable  multi-qubit  quantum  trans¬ 
form  U  can  be  efficiently  approximated  by  composing 
these  quantum  gates  into  a  circuit.  In  addition  to  these 
universal  gates,  one  more  important  operator  is  the  SWAP 
gate.  SWAP  can  be  implemented  as  three  CNOTs.  However, 
SWAP  is  often  available  as  a  basic  gate  for  a  given  tech¬ 
nology,  which  is  a  valuable  thing,  given  its  importance  to 
quantum  communication. 

In  quantum  circuits,  time  goes  from  left  to  right,  where 
single  lines  represent  qubits,  and  double  lines  represent 
classical  bits.  A  meter  is  used  to  represent  measurement. 
By  convention,  black  dots  represent  control  terminals  for 
quantum-controlled  gates.  The  ®  symbol  is  shorthand  for 
the  target  qubit  of  the  CNOT  gate  (Figure  2). 
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Classical  control  gates 


2.3  Quantum  teleportation 

Quantum  teleportation  is  the  re-creation  of  a  quantum  state 
at  a  distance.  Contrary  to  its  science  fiction  counterpart, 
quantum  teleportation  is  not  instantaneous  transmission  of 
information.  Rather,  it  uses  an  entangled  EPR  pair ,  |4>)  = 
75(|00>  +  |H))  [4]. 

Figure  2  gives  an  overview  of  the  teleportation  process. 
We  start  by  generating  an  EPR  pair.  We  separate  the  pair, 
keeping  one  qubit,  \b),  at  the  source  and  transporting  the 
other,  |  c),  to  the  destination.  When  we  want  to  send  a 
qubit,  | a),  we  first  interact  | a)  with  | b)  using  a  CNOT  gate. 
We  then  measure  |  a)  and  |  b)  in  the  computational  basis, 
and  send  the  two  one-bit  classical  results  to  the  destination, 
and  use  those  results  to  re-create  the  correct  phase  and  am¬ 
plitude  in  | c)  such  that  it  takes  on  the  original  state  of  \a). 
The  re-creation  of  phase  and  amplitude  is  done  with  X  and 
Z  gates,  whose  application  is  contingent  on  the  outcome  of 
the  measurements  of  |  a)  and  |  b).  Intuitively,  since  |  c)  has  a 
special  relationship  with  |6),  interacting  |  a)  with  |  b)  makes 
| c)  resemble  | a),  modulo  a  phase  and/or  amplitude  error. 
The  two  measurements  allow  us  to  correct  these  errors  and 
re-create  |  a)  at  the  destination.  Note  that  the  original  state 
of  | a)  is  destroyed  when  we  take  our  two  measurements. 
This  is  consistent  with  the  “no-cloning”  theorem,  which 
states  that  a  quantum  state  cannot  be  copied. 

Why  bother  with  teleportation  when  we  end  up  trans¬ 
porting  | c)  anyway?  Why  not  just  transport  | a)  directly? 
First,  we  can  pre-communicate  EPR  pairs  with  extensive 
pipelining  without  stalling  computations.  Second,  it  is  eas¬ 
ier  to  transport  EPR  pairs  than  real  data.  Since  |  b)  and  |c) 
have  known  properties,  we  can  employ  a  specialized  pro¬ 
cedure  known  as  purification  to  turn  a  collection  of  pairs 
partially  damaged  from  transport  into  a  smaller  collection 
of  asymptotically  perfect  pairs.  Third,  transmitting  the  two 
classical  bits  resulting  from  the  measurements  is  more  re¬ 
liable  than  transmitting  quantum  data. 


3  Solid-State  Technologies 

With  some  basics  of  quantum  operations  in  mind,  we  turn 
our  attention  to  the  technologies  available  to  implement 
these  operations.  Experimentalists  have  examined  several 
technologies  for  quantum  computation,  including  Joseph- 
son  junctions  [30,  50],  trapped  ions  [29],  photons  [45], 
bulk  spin  NMR  [48],  and  phosphorus  impurities  in  sili¬ 
con  [22].  Of  these  proposals,  only  those  building  on  a 
solid-state  platform  are  expected  to  provide  the  scalabil¬ 
ity  required  to  achieve  a  useful  computational  substrate. 
The  Kane  [22,  42]  schemes  of  phosphorus  in  silicon  builds 
upon  modern  semiconductor  fabrication  and  transistor  de¬ 
sign,  drawing  upon  understood  physical  properties.  To  fo¬ 
cus  the  presentation  in  this  paper  we  begin  our  calculations 
with  the  Kane  proposal,  and  then  generalize  to  consider 
limits  imposed  by  any  solid-state  technology.  This  quan¬ 
tum  analysis  proceeds  in  precisely  the  same  manner  that 
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Ground  plane 

Figure  3.  The  basic  quantum  bit  technology  pro¬ 
posed  by  Kane  [42].  Qubits  are  embodied  by  the 
nuclear  spin  of  a  phosphorus  atom  coupled  with 
an  electron  embedded  in  silicon  under  high  mag¬ 
netic  field  at  low  temperature. 


it  would  in  the  classical  domain — by  characterizing  device 
technologies  with  a  few  underlying  parameters. 

Kane  proposes  that  the  nuclear  spin  of  a  phosphorus 
atom  coupled  with  an  electron  embedded  in  silicon  under 
a  high  magnetic  field  and  low  temperature  can  be  used  as  a 
quantum  bit,  much  as  nuclear  spins  in  molecules  have  been 
shown  to  be  good  quantum  bits  for  quantum  computation 
with  nuclear  magnetic  resonance  [15].  This  quantum  bit 
is  classically  controlled  by  a  local  electric  field.  The  pro¬ 
cess  is  illustrated  in  Figure  3.  Shown  are  two  phosphorus 
atoms  spaced  15-100  nm  apart.  This  inter-qubit  spacing  is 
currently  a  topic  of  debate  within  the  physics  community, 
with  conservative  estimates  of  15nm,  and  more  aggressive 
estimations  of  lOOnm.  What  is  being  traded  off  is  noise  im¬ 
munity  versus  difficulty  of  manufacturing.  For  our  study, 
we  will  use  a  figure  (60nm)  that  lies  between  these  two.  We 
parameterize  our  work,  however,  to  generalize  for  changes 
in  the  underlying  technology. 

Twenty  nanometers  above  the  phosphorus  atoms  lie 
three  classical  wires  that  are  spaced  20  nm  apart.  By  ap¬ 
plying  precisely  timed  pulses  to  these  electrodes  Kane  de¬ 
scribes  how  arbitrary  one-  and  two-qubit  quantum  gates 
can  be  realized.  Four  different  sets  of  pulse  signals  must 
be  routed  to  each  electrode  to  implement  a  universal  set  of 
quantum  operations.  The  details  of  the  pulses  and  quantum 
mechanics  of  this  technique  are  beyond  the  scope  of  this 
paper  and  are  described  in  [42] . 

The  Kane  proposal,  like  all  quantum  computing  pro¬ 
posals,  uses  classical  signals  to  control  the  timing  and  se¬ 
quence  of  operations.  All  known  quantum  algorithms,  in¬ 
cluding  basic  error  correction  for  quantum  data,  require  the 
determinism  and  reliability  of  classical  control.  Without  ef¬ 
ficient  classical  control,  fundamental  results  demonstrating 
the  feasibility  of  quantum  computation  do  not  apply  (such 
as  the  Threshold  Theorem  used  in  Section  4.2.3). 

Quantum  computing  systems  display  a  characteristic 
tension  between  computation  and  communication.  Funda¬ 
mentally,  technologies  that  transport  data  well  do  so  be- 
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Information  Flow 


Figure  4.  Short  wires  are  constructed  from  suc¬ 
cessive  qubits  (phosphorus  atoms).  Information 
in  the  quantum  data  path  is  swapped  from  atom  to 
atom  by  classical  control.  This  localized  control 
produces  swapping  behavior  through  a  repeated 
series  of  three  back-to-back  cnot  operations. 


1 0mm  access  points  contain 
only  a  handful  of  quantum 
states  for  their  electrons  at 
temperatures  less  than  1 K, 
preventing  correct 
operation. 


As  two  physical  dimensions  of  the  access  point 
exceed  lOOnm  thousands  of  electron  states  are  held. 


Classically,  electron  states  are 
restricted  to  the  access  point; 
some  electrons  will,  however, 
scatter  into  the  narrow  wire  and 
move  ballistically  downward, 
enabling  proper  control. 


Figure  5.  Quantization  of  electron  states  overcome 
by  increasing  the  physical  dimension  of  the  con¬ 
trol  lines  beyond  100  nm.  The  states  propagate 
quantum-mechanically  downward  through  access 
vias  to  control  the  magnetic  field  around  the  phos¬ 
phorus  atoms. 


cause  they  are  resistant  to  interaction  with  the  environ¬ 
ment  or  other  quantum  bits;  on  the  other  hand  technologies 
that  compute  well  do  so  precisely  because  they  do  inter¬ 
act.  Thus,  computation  and  communication  are  somewhat 
at  odds. 

In  particular,  atomic-based  solid-state  technologies  are 
good  at  providing  scalable  computation  but  complicate 
communication,  because  their  information  carriers  have 
nonzero  mass.  The  Kane  proposal,  for  example,  repre¬ 
sents  a  quantum  bit  with  the  nuclear  spin  of  a  phosphorus 
atom  implanted  in  silicon.  The  phosphorus  atom  does  not 
move,  thus  transporting  this  state  to  another  part  of  the  chip 
is  laborious  and  requires  carefully  controlled  swapping 
of  the  states  of  neighboring  atoms.  In  contrast,  photon- 
based  proposals  that  use  polarization  to  represent  quantum 
states  can  easily  transport  data  over  long  distances  through 
fiber.  It  is  very  difficult,  however,  to  get  photons  to  in¬ 
teract  and  achieve  any  useful  computation.  Further,  trans¬ 
ferring  quantum  states  between  atomic  and  photon-based 
technologies  is  extremely  difficult. 

Optimizing  these  tensions,  between  communication  and 
computation,  between  classical  control  and  quantum  ef¬ 
fects,  imply  a  structure  to  quantum  systems.  Rather  than 
cover  the  gamut  of  quantum  architecture  we  instead  will  fo¬ 
cus  on  a  very  crucial  architectural  concept:  a  wire.  Specif¬ 
ically,  we  begin  by  examining  a  short  wire. 

4  Short  Wires 

We  begin  by  examining  a  “short”  quantum  wire.  Sec¬ 
tion  4.2  shows  that  the  basic  short  wire  does  not  scale  well, 
hence  a  more  scalable  approach  appears  Section  5. 

In  solid-state  technologies,  a  line  of  qubits  is  one  plau¬ 
sible  approach  to  transporting  quantum  data.  Figure  4  pro¬ 


vides  a  schematic  of  a  swapping  channel  in  which  informa¬ 
tion  is  progressively  swapped  between  pairs  of  qubits  in  the 
quantum  datapath — somewhat  like  a  bubble  sort2.  Swap¬ 
ping  channels  require  active  control  from  classical  logic, 
illustrated  by  the  classical  control  plane  of  Figure  4. 

4.1  Technical  Challenges 

As  simple  as  it  might  appear,  a  quantum  swapping  channel 
presents  significant  technical  challenges.  The  first  hurdle 
is  the  placement  of  the  phosphorus  atoms  themselves.  The 
leading  work  in  this  area  has  involved  precise  ion  implan¬ 
tation  through  masks,  and  manipulation  of  single  atoms  on 
the  surface  of  silicon  [23] .  For  applications  where  substan¬ 
tial  monetary  investment  is  not  an  issue,  slowly  placing  a 
few  hundred  thousand  phosphorus  atoms  with  a  probe  de¬ 
vice  [16]  may  be  possible.  For  bulk  manufacturing  the  ad¬ 
vancement  of  DNA  or  other  chemical  self-assembly  tech¬ 
niques  [1]  may  need  to  be  developed.  Note,  while  new 
technologies  may  be  developed  to  enable  precise  place¬ 
ment,  the  key  for  our  work  is  only  the  spacing  (60  nm) 
of  the  phosphorus  atoms  themselves,  and  the  number  of 
control  lines  (3)  per  qubit.  The  relative  scale  of  quantum 
interaction  and  the  classical  control  of  these  interactions  is 
what  will  lead  our  analysis  to  the  fundamental  constraints 
on  quantum  computing  architectures. 

A  second  challenge  is  the  scale  of  classical  control. 
Each  control  line  into  the  quantum  datapath  is  roughly  10 
nm  in  width.  While  such  wires  are  difficult  to  fabricate,  we 
expect  that  either  electron  beam  lithography  [3],  or  phase- 
shifted  masks  [36]  will  make  such  scales  possible. 

2For  technologies  that  do  not  have  an  intrinsic  swap  operation,  one  can 
be  implemented  by  three  controlled-not  gates  performed  in  succession. 
This  is  a  widely  known  result  in  the  quantum  computing  field  and  we 
refer  the  interested  reader  to  [31]. 
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Classical  pitch  (~100nm) 


Classical  control  access  points 


Figure  6.  A  linear  row  of  quantum  bits:  In  this  figure  (not  drawn  to  scale)  we  depict  access  control  for  a 
line  of  quantum  bits.  On  the  left,  we  depict  a  “top  down”  view.  On  the  right  is  a  vertical  cross-section 
which  more  clearly  depicts  the  narrow-tipped  control  lines  that  quickly  expand  to  classical  dimensions. 


A  remaining  challenge  is  the  temperature  of  the  device. 
In  order  for  the  quantum  bits  to  remain  stable  for  a  reason¬ 
able  period  of  time  the  device  must  be  cooled  to  less  than 
one  degree  Kelvin.  The  cooling  itself  is  straightforward, 
but  the  effect  of  the  cooling  on  the  classical  logic  is  a  prob¬ 
lem.  Two  issues  arise:  first  conventional  transistors  stop 
working  as  the  electrons  become  trapped  near  their  dopant 
atoms,  which  fail  to  ionize.  Second,  the  10  nm  classical 
control  lines  begin  to  exhibit  quantum-mechanical  behav¬ 
ior  such  as  conductance  quantization  and  interference  from 
ballistic  transport  [14]. 

Fortunately,  many  researchers  are  already  working  on 
low-temperature  transistors.  For  instance,  single-electron 
transistors  (SET’s)  [27]  are  the  focus  of  intense  research 
due  to  their  high  density  and  low  power  properties.  SET’s, 
however,  have  been  problematic  for  conventional  comput¬ 
ing  because  they  are  sensitive  to  noise  and  operate  best  at 
low  temperatures.  For  quantum  computing,  this  predilec¬ 
tion  for  low  temperatures  is  exactly  what  is  needed!  Tucker 
and  Shen  describe  this  complementary  relationship  and 
propose  several  fabrication  methods  in  [44]. 

On  the  other  hand,  the  quantum-mechanical  behavior  of 
the  control  lines  presents  a  subtle  challenge  that  has  been 
mostly  ignored  to-date.  At  low  temperatures,  and  in  narrow 
wires,  the  quantum  nature  of  electrons  begins  to  dominate 
over  normal  classical  behavior.  For  example,  in  100  nm 
wide  poly  silicon  wires  at  100  millikelvin,  electrons  propa¬ 
gate  ballistically  like  waves,  through  only  one  conductance 
channel,  which  has  an  impedance  given  by  the  quantum  of 
resistance,  h/e1  «  25  k Q.  Impedance  mismatches  to  these 
and  similar  metallic  wires  make  it  impossible  to  properly 
drive  the  AC  current  necessary  to  perform  qubit  operations. 

Avoiding  such  limitations  mandates  a  geometric  design 
constraint:  narrow  wires  must  be  short  and  locally  driven 
by  nearby  wide  wires.  Using  100  nm  as  a  rule  of  thumb3 
for  a  minimum  metallic  wire  width  sufficient  to  avoid  un¬ 
desired  quantum  behavior  at  these  low  temperatures,  we 

3 This  value  is  based  on  typical  electron  mean  free  path  distances,  given 
known  scattering  rates  and  the  electron  Fermi  wavelength  in  metals. 


Figure  7.  Intersection  of  quantum  bits.  In  this  sim¬ 
plified  view,  we  depict  a  four-way  intersection  of 
quantum  bits.  An  diamond  shaped  junction  is  also 
needed  to  densely  pack  junction  cells. 

obtain  a  control  gate  structure  such  as  that  depicted  in  Fig¬ 
ure  5.  Here,  wide  wires  terminate  in  10  nm  vias  that  act  as 
local  gates  above  individual  phosphorus  atoms. 

Producing  a  line  of  quantum  bits  that  overcomes  all  of 
the  above  challenges  is  possible.  We  illustrate  a  design  in 
Figure  6.  Note  how  access  lines  quickly  taper  into  upper 
layers  of  metal  and  into  control  areas  of  a  classical  scale. 
These  control  areas  can  then  be  routed  to  access  transistors 
that  can  gate  on  and  off  the  frequencies  (in  the  10’ s  to  100’s 
of  MHz)  required  to  apply  specific  quantum  gates. 

Of  course,  any  solution  for  data  transport  must  also  sup¬ 
port  routing.  Routing  is  not  possible  without  fanout  pro¬ 
vided  by  wire  intersections.  We  can  extend  our  linear  row 
of  quantum  bits  to  a  four- way  intersection  capable  of  sup¬ 
porting  sparsely  intersecting  topologies  of  quantum  bits. 
We  illustrate  the  quantum  intersection  in  Figure  7.  This 
configuration  is  similar  to  Figure  6  except  that  the  intersec¬ 
tion  creates  a  more  challenging  tapering. 
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4.2  Analysis 

We  now  analyze  this  short  wire  to  derive  two  important 
architectural  constraints:  the  classical-quantum  interface 
boundary  and  the  latency/bandwidth  characteristics.  We 
strive  to  achieve  a  loose  lower  bound  on  these  constraints 
for  a  given  quantum  device  technology.  While  future  quan¬ 
tum  technologies  may  have  different  precise  numbers,  it 
is  almost  certain  they  will  continue  to  be  classically  con¬ 
trolled,  and  thus  also  obey  similar  constraints  based  upon 
this  classical-quantum  interface. 

4.2.1  Pitch  Matching 

Our  first  constraint  is  derived  from  the  need  to  have  classi¬ 
cal  control  of  our  quantum  operations.  As  previously  dis¬ 
cussed,  we  need  a  minimum  wire  width  to  avoid  quantum 
effects  in  our  classical  control  lines.  Referring  back  to  Fig¬ 
ure  7,  we  can  see  that  each  quadrant  of  our  four- way  inter¬ 
section  will  need  to  be  some  minimum  size  to  accommo¬ 
date  access  to  our  control  signals. 

Recall  from  Figure  3  that  each  qubit  has  three  associated 
control  signals  (one  A  and  two  S  gates).  Each  of  these  con¬ 
trol  lines  must  expand  from  a  thin  10  nm  tip  into  a  100  nm 
access  point  in  an  upper  metal  layer  to  avoid  charge  quan¬ 
tization  effects  at  low  temperatures  (Figure  5).  Given  this 
structure,  it  is  possible  to  analytically  derive  the  minimum 
width  of  a  line  qubits  and  its  control  lines,  as  well  as  the 
size  of  a  four- way  intersection.  For  this  minimum  size  cal¬ 
culation,  we  assume  all  classical  control  lines  are  routed  in 
parallel,  albeit  spread  across  the  various  metal  layers.  This 
parallel  nature  makes  this  calculation  trivial  under  nor¬ 
mal  circumstances  (sufficiently  “large”  lithographic  feature 
size  Ac),  with  the  minimum  line  segment  being  equal  in 
length  to  twice  the  classical  pitching,  150nm  in  our  case, 
and  the  junction  size  equal  to  four  times  the  classical  pitch¬ 
ing,  400nm,  in  size.  However,  we  illustrate  the  detailed 
computation  to  make  the  description  of  the  generalization 
clearer.  We  begin  with  a  line  of  qubits. 

Let  N  be  the  number  of  qubits  along  the  line  segment. 
Since  there  are  three  gates  (an  A  and  two  S  lines)  we  need 
to  fit  in  3N  classical  access  points  of  100  nm  in  dimension 
each,  in  the  line  width.  We  accomplish  this  by  offsetting 
the  access  points  in  the  x  and  y  dimensions  (Figure  6)  by 
20nm.  The  total  size  of  these  offsets  will  be  lOOnm  divided 
by  the  qubit  spacing  60nm  times  the  number  of  control 
lines  3  per  qubit,  times  the  offset  distance  of  20nm.  This 
number  100nm/60nm  x  3  x  20nm  =  lOOnm  is  divided 
by  2  because  the  access  lines  lines  are  spread  out  on  each 
side  of  the  wire.  Hence,  the  minimum  line  segment  will 
be  100  +  50nm.  Shorter  line  segments  within  larger,  more 
specialized  cells  are  possible. 

Turning  our  attention  to  an  intersection  (Figure  7),  let 
N  be  the  number  of  qubits  along  each  “spoke”  of  the  junc¬ 
tion.  We  need  to  fit  3N  classical  access  points  in  a  space 
of  (60  nm  x  N )2,  where  each  access  point  is  at  least 
100  nm  on  a  side.  As  with  the  case  of  a  linear  row  of 


bits,  a  20  nm  x  and  y  shift  in  access  point  positioning  be¬ 
tween  layers  is  used  for  via  access.  Starting  with  a  sin¬ 
gle  access  pad  of  lOOnm,  we  must  fit  100nm/60nm  x  3 
additional  pads  shifted  in  x  and  y  within  the  single  quad¬ 
rant  of  our  intersection.  This  leads  to  a  quadrant  size  of 
100  +  100nm/60nm  x  3  x  20nm  =  200nm.  Therefore, 
the  minimum  size  four  way  intersection  is  8  (rounding  up) 
qubits  in  each  direction. 

In  this  construction  we  have  assumed  a  densely  packed 
edge  to  each  spoke,  however,  this  is  easily  “unpacked”  with 
a  specialized  line  segment,  or  by  joining  to  another  junction 
that  is  constructed  inversely  from  that  shown  in  Figure  7. 
Obviously,  the  specific  sizes  will  vary  according  to  tech¬ 
nological  parameters  and  assumptions  about  control  logic, 
but  this  calculation  illustrates  the  approximate  effect  of 
what  appears  to  be  a  fundamental  tension  between  quantum 
operations  and  the  classical  signals  that  control  them.  A 
minimum  intersection  size  implies  minimum  wire  lengths, 
which  imply  a  minimum  size  for  computation  units. 

4.2.2  Technology  Independent  Limits 

Thus  far  we  have  focused  our  discussion  on  a  particular 
quantum  device  technology.  This  has  been  useful  to  make 
the  calculations  concrete.  Nevertheless,  it  is  useful  to  gen¬ 
eralize  these  calculations  to  future  quantum  device  tech¬ 
nologies.  Therefore  we  parameterize  our  discussion  based 
on  a  few  device  characteristics: 

Assuming  two-dimensional  devices  (i.e.  not  a  cube  of 
quantum  bits),  let  pc  be  the  classical  pitching  required,  and 
pq  the  quantum  one.  Furthermore,  let  R  be  the  ratio  pc/Pq 
of  the  classical  to  quantum  distance  for  the  device  technol¬ 
ogy,  m  be  the  number  of  classical  control  lines  required  per 
quantum  bit,  and  finally  Ac  be  the  feature  size  of  the  litho¬ 
graphic  technology.  We  use  two  separate  variables  pc  and 
Ac  to  characterize  the  “classical”  technology  because  they 
arise  from  different  physical  constraints.  The  parameter  A  c 
comes  from  the  lithographic  feature  size,  while  p  c  (which  is 
a  function  of  Ac)  is  related  to  the  charge  quantization  effect 
of  electrons  in  gold.  With  the  Kane  technology  we  assume 
a  spacing  pq  of  60nm  between  qubits,  three  control  lines 
per  bit  of  lOOnm  (pc )  each,  and  a  Ac  of  5nm.  We  can  use 
these  to  generalize  our  pitch  matching  equations.  Here  we 
find  that  the  minimum  line  segment  is  simply  equivalent  to 
R(  1  +  2 A cm/pq)  qubits  in  length. 

Examining  our  junction  structure  (Figure  7),  we  note 
that  it  is  simply  four  line  segments,  similar  to  those  cal¬ 
culated  above,  except  that  the  control  lines  must  be  on 
the  same  side.  Therefore  the  minimum  crossing  size  of 
quantum  bits  in  a  two-dimensional  device  is  of  size  ~ 
277(1  +  4A cm/pq)  on  a  side. 

4.2.3  Latency  and  Bandwidth 

Calculating  the  latency  and  bandwidth  of  quantum  wires 
is  similar  but  slightly  different  than  it  is  for  classical  sys¬ 
tems.  The  primary  difficulty  is  decoherence — i.e.  quan- 
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turn  noise.  Unlike  classical  systems,  if  you  want  to  per¬ 
form  a  quantum  computation,  you  cannot  simply  re-send 
quantum  data  when  an  error  is  detected.  The  “no-cloning” 
theorem  [31],  according  to  which  quantum  states  cannot 
be  perfectly  copied,  prohibits  transmission  by  duplication, 
thereby  making  it  impossible  to  re-transmit  quantum  data 
if  it  is  corrupted.  Once  the  data  is  destroyed  by  the  noisy 
channel,  you  have  to  start  the  entire  computation  over.  To 
avoid  this  loss,  quantum  data  is  encoded  in  a  sufficiently 
strong  error-correcting  code  that,  with  high  probability,  the 
data  will  remain  coherent  for  the  entire  length  of  the  quan¬ 
tum  algorithm.  Unfortunately,  quantum  systems  will  be  so 
error-prone  that  they  will  execute  right  at  the  limits  of  their 
error  tolerance  [33]. 

Our  goal  is  to  provide  a  quantum  communication  layer 
which  sits  below  higher  level  error  correction  schemes.  We 
will  discuss  our  future  work,  which  is  the  interaction  of  this 
layer  with  quantum  error  correction  and  algorithms  in  Sec¬ 
tion  6.  Consequently,  we  start  our  calculation  by  assuming 
a  channel  with  no  error  correction.  Then  we  factor  in  the 
effects  of  decoherence  and  derive  a  maximum  wire  length 
for  our  line  of  qubits. 

Recall  that  data  traverses  the  line  of  qubits  with  swap 
gates,  each  of  which  takes  approximately  1  /a s  to  execute 
in  the  Kane  technology.  Thus,  a  single  row  of  quantum  bits 
has  latency: 

latency  =  l/isx  distance/60  nm  (1) 

This  latency  can  be  quite  large.  A  short  1  /am  has  a  la¬ 
tency  of  0.000017  seconds!  On  the  plus  side,  the  wire 
can  be  fully  pipelined  and  has  a  sustained  bandwidth  of 
1/1  /as  =  1M  qbps  (quantum  bits  per  second).  This  may 
seem  small  compared  to  a  classical  wire,  but  keep  in  mind 
that  quantum  bits  hold  an  exponential  amount  of  informa¬ 
tion  and  can  enable  algorithms  with  exponential  power. 

The  number  of  error-free  qubits  is  actually  lower  than 
this  physical  bandwidth.  Noise,  or  decoherence,  degrades 
quantum  state  and  makes  the  true  bandwidth  of  our  wire 
less  than  the  physical  quantum  bits  per  second.  Bits  deco¬ 
here  over  time,  so  longer  wires  will  have  a  lower  bandwidth 
than  shorter  ones. 

The  stability  of  a  quantum  bit  over  time  decays  (exactly 
like  an  un-error  corrected  classical  bit)  as  a  function  e~kxt. 
Usually,  a  normalized  form  of  this  equation  is  used,  e~Xxt, 
where  t  in  this  new  equation  is  the  number  of  operations 
and  A  is  related  to  the  time  per  operation  and  the  original 
k.  As  quantum  bits  traverse  through  our  wire  they  arrive 
with  a  fidelity  proportional  to  the  latency,  namely: 

fidelity  =  e~kxlatency  (2) 

The  true  bandwidth  is  then  proportional  to  the  fidelity: 

bandwidthtrue  =  bandwidthphysicai  x  fidelity  (3) 

Choosing  a  reasonable  4  value  of  A  ^  10-6,  we  find  the 

4 This  value  for  A  is  calculated  from  a  decoherence  rate  of  1CT6  per 


true  bandwidth  of  a  wire  to  be: 

1/1  /US  X  e~ xdistance/60  nm 

which  for  a  1  /am  wire  is  close  to  ideal  (999,983  qbps). 

This  does  not  seem  to  be  a  major  effect,  until  you  con¬ 
sider  an  entire  quantum  algorithm.  Data  may  traverse  back 
and  forth  across  a  quantum  wire  millions  of  times.  It  is  cur¬ 
rently  estimated  [2]  that  a  degradation  of  fidelity  more  than 
10-4  makes  arbitrarily  long  quantum  computation  theo¬ 
retically  unsustainable,  with  the  practical  limit  being  far 
higher  [33].  This  limit  is  derived  from  the  Threshold  The¬ 
orem,  which  relates  the  decoherence  of  a  quantum  bit  to 
the  complexity  of  correcting  this  decoherence  [26,  34,  2] . 5 
Given  our  assumptions  about  A,  the  maximum  theoretical 
wire  distance  is  about  6 /am,  and  again  the  practical  wire 
distance  is  about  two  orders  of  magnitude  less  than  this. 

4.2.4  Technology  Independent  Metrics 

Our  latency  and  bandwidth  calculations  require  slightly 
more  device  parameters.  Let  T  be  the  time  per  basic  swap 
operation.  Some  technologies  will  have  an  intrinsic  SWAP, 
and  others  will  require  synthesizing  the  swap  from  3  CNOT 
operations.  Let  A  be  the  decoherence  rate,  which  for  small 
A  and  T  is  equivalent  to  the  decoherence  a  quantum  bit 
undergoes  in  a  unit  of  operation  time  T.  This  makes  the 
latency  of  a  swapping  channel  wire  equal  to: 

latency  =  T  x  D  (5) 

Where  distance  D  is  expressed  in  the  number  of  qubits. 
The  bandwidth  is  proportional  to  the  fidelity  or: 

bandwidthtrue  =  —e~XD  (6) 

This  bandwidth  calculation  is  correct  so  long  as  the  fidelity 
remains  above  the  critical  threshold  C  ~  10-4  required  for 
fault  tolerant  computation.  Finally,  the  maximum  distance 
of  this  swapping  channel  is  the  distance  when  the  fidelity 
drops  below  the  critical  threshold: 

di  stance  max  —  loQe  (1  67)/  ^  (7) 

Realize  that  no  amount  of  error  correction  will  be  ro¬ 
bust  enough  to  support  a  longer  wire,  while  still  supporting 
arbitrarily  long  quantum  computation.  For  this  we  need  a 
more  advanced  architecture.  One  obvious  option  is  to  break 
the  wire  into  segments  and  insert  “repeaters”  in  the  middle. 
These  quantum  repeaters  are  effectively  performing  state 
restoration  (error  correction).  However,  we  can  do  better, 
which  is  the  subject  of  the  next  section. 

operation,  where  each  operation  requires  1  fis.  It  is  aggressive,  but  not 
too  unreasonable  for  phosphorus  atoms  in  silicon.  We  refer  the  interested 
reader  to  [31]. 

5 By  “practical”  we  mean  without  an  undue  amount  of  error  correc¬ 
tion.  The  threshold  theorem  ensures  that  theoretically  we  can  compute 
arbitrarily  long  quantum  computations,  but  the  practical  overhead  of  error 
correction  makes  the  real  limit  2-3  orders  of  magnitude  higher  [33]. 
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Figure  8.  Architecture  for  a  Quantum  Wire:  Solid  double  lines  represent  classical  communication  chan¬ 
nels,  while  chained  links  represented  a  quantum  swapping  channel.  Single  lines  depict  the  direction  in 
which  the  swapping  channel  is  being  used  for  transport. 


5  Long  Wires 

In  this  section,  we  introduce  an  architecture  for  long  quan¬ 
tum  wires,  shown  in  Figure  8.  These  wires  make  use  of  the 
quantum  primitive  of  teleportation.  Teleportation  involves 
pre-communication  of  EPR  pairs,  followed  by  a  combina¬ 
tion  of  quantum  measurement  and  classical  communication 
to  destroy  a  quantum  state  at  one  end  of  a  wire  and  re-create 
it  on  the  other  end.  The  key  is  that  the  pre-communication 
can  be  pipelined.  Furthermore,  teleportation  allows  quan¬ 
tum  wires  to  convert  quantum  data  between  components 
that  use  different  error  correction  codes,  a  conversion  that 
is  impractical  without  teleportation.  In  the  next  few  sec¬ 
tions,  we  provide  a  brief  introduction  to  the  core  architec¬ 
tural  components  of  this  wire. 

5.1  Basic  Building  Blocks 

Although  teleportation  and  the  mechanisms  described  in 
this  section  are  known  in  the  literature,  what  has  been  miss¬ 
ing  is  the  identification  and  analysis  of  which  mechanisms 
form  fundamental  building  blocks  of  a  realistic  system. 
In  this  section,  we  highlight  three  important  architectural 
building  blocks:  the  entropy  exchange  unit ,  the  EPR  gen¬ 
erator ,  and  the  purification  unit.  Note  that  the  descrip¬ 
tion  of  theses  blocks  is  quasi-classical  in  that  it  involves 
input  and  output  ports.  Keep  in  mind,  however,  that  all 
operations  (except  measurement)  are  inherently  reversible, 
and  the  specification  of  input  and  output  ports  merely  pro¬ 
vides  a  convention  for  understanding  the  forward  direction 
of  computation. 

5.1.1  Entropy  exchange  unit 

The  physics  of  quantum  computation  requires  that  opera¬ 
tions  be  reversible  and  conserve  energy.  The  initial  state  of 
the  system,  however,  must  be  created  somehow.  We  need 


to  be  able  to  create  zero  states,  denoted  as  “|0)’\  Further¬ 
more,  errors  cause  qubits  to  become  randomized;  stated 
equivalently,  entropy  enters  the  system  through  decoher¬ 
ence  caused  by  coupling  with  the  external  environment. 

Where  do  these  zero  states  come  from?  The  process  can 
be  viewed  as  one  of  thermodynamic  cooling.  Distributed 
throughout  a  quantum  processor  are  “cool”  quantum  bits  in 
a  nearly  zero  state.  These  can  be  created  by  pulling  spin- 
polarized  electrons  (created,  for  example,  using  a  standard 
technique  known  as  optical  pumping  [23]  [49]  or  directly 
using  spintronics  methods,  with  ferromagnetic  materials 
and  spin  filters  [23])  over  the  phosphorus  atoms. 

To  arbitrarily  increase  this  probability  (and  make  an  ex¬ 
tremely  cold  zero  state)  we  can  use  a  variant  of  the  purifi¬ 
cation  technique  described  in  Section  5.1.3.  Specifically, 
we  employ  an  efficient  algorithm  for  data  compression  [38] 
[39]  that  gathers  entropy  across  a  number  of  qubits  into  a 
small  subset  of  highly  random  qubits.  As  a  result,  the  re¬ 
maining  quantum  bits  are  reinitialized  to  the  desired  pure 
zero  state  |0). 

5.1.2  EPR  Generator 

Constructing  an  EPR  pair  of  quantum  bits  is  straightfor¬ 
ward.  We  start  with  two  |0)  state  bits  from  our  entropy 
exchange  unit.  A  Hadamard  gate  is  applied  to  the  first  of 
these  quantum  bits.  We  then  take  this  transformed  quantum 
bit  that  is  in  a  half-way  superposition  of  a  zero  and  a  one 
state  and  use  it  as  the  control  bit  for  a  controlled- NOT  gate. 
The  target  bit  that  is  to  be  inverted  is  the  other  fresh  |0) 
quantum  bit  from  the  entropy  exchange  unit.  A  controlled- 
NOT  gate  is  a  bit  like  a  classical  inverter  except  the  target 
bit  is  inverted  if  the  control  bit  is  in  the  |1)  state.  Using  a 
control  bit  of  (|0)  +  1 1))/\/2  and  a  target  bit  of  |0)  we  end 
up  with  a  two  bit  entangled  state  of  (|00)  +  1 11))/ y/2.  The 
quantum  bits  in  this  state  are  called  an  EPR  pair. 

The  overall  process  of  EPR  generation  is  depicted  in 
Figure  9.  Schematically  the  EPR  generator  has  a  single 
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Figure  9.  Quantum  EPR  generator:  Solid  double  Figure  10.  Quantum  purification  unit:  EPR  States 
lines  represent  classical  communication  (or  con-  are  sufficiently  regular  that  they  can  be  purified  at 
trol),  while  single  lines  depict  quantum  wires.  the  ends  of  a  teleportation  channel. 


quantum  input  and  two  quantum  outputs.  The  input  is  di¬ 
rectly  piped  from  the  entropy  exchange  unit  and  the  output 
is  the  entangled  EPR  pair. 

5.1.3  Purification  unit 

The  final  building  block  we  require  is  the  purification  unit. 
This  unit  takes  as  input  n  EPR  pairs  which  have  been  par¬ 
tially  corrupted  by  errors,  and  outputs  nE  asymptotically 
perfect  EPR  pairs.  E  is  the  entropy  of  entanglement,  a 
measure  of  the  number  of  quantum  errors  which  the  pairs 
suffered.  The  details  of  this  entanglement  purification  pro¬ 
cedure  are  beyond  the  scope  of  this  paper  but  the  interested 
reader  can  see  [10,  5,  8]. 

Figure  10  depicts  a  purification  block.  The  quantum  in¬ 
puts  to  this  block  are  the  input  EPR  states  and  a  supply  of 
|0)  bits.  The  outputs  are  pure  EPR  states.  Note  that  the 
block  is  carefully  designed  to  correct  only  up  to  a  certain 
number  of  errors;  if  more  errors  than  this  threshold  occur, 
then  the  unit  fails  with  increasing  probability. 

5.2  Analysis 

Figure  8  illustrates  how  we  use  these  basic  building  blocks 
and  protocols  for  constructing  a  long  wire.  The  EPR  gen¬ 
erator  is  placed  in  the  middle  of  the  wire  and  “pumps”  en¬ 
tangled  quantum  bits  to  each  end  (via  a  pipelined  swap¬ 
ping  channel).  These  bits  are  then  purified  such  that  only 
the  error- free  qubits  remain.  Purification  and  teleportation 
consume  zero-state  qubits  that  are  supplied  by  the  entropy 
exchange  unit.  Finally,  the  coded-teleportation  unit  trans¬ 
mits  quantum  data  from  one  end  of  the  wire  to  the  other 
using  the  protocol  described  in  Section  2.3.  Our  goal  now 
is  to  analyze  this  architecture  and  derive  its  bandwidth  and 
latency  characteristics. 

The  bandwidth  is  proportional  to  the  speed  with  which 
reliable  EPR  pairs  are  communicated.  Since  we  are  com¬ 
municating  unreliable  pairs  we  must  purify  them,  so  the 
efficiency  of  the  purification  process  must  be  taken  into  ac¬ 
count.  Purification  has  an  efficiency  roughly  proportional 
to  the  fidelity  of  the  incoming,  unpurified  qubits  [38]: 

purificationefficiency  «  fidelity2  (8) 
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Entropy  exchange  is  a  sufficiently  parallel  process  that  we 
assume  enough  zero  qubits  can  always  be  supplied.  There¬ 
fore,  the  overall  bandwidth  of  this  long  quantum  wire  is: 

1/1  flS  X  e~ 2X 10-6  xdistance/60  nm 

which  for  a  1  /im  wire  is  999,967  qbps.  Note  this  re¬ 
sult  is  less  than  for  the  simple  wiring  scheme,  but  the  de¬ 
coherence  introduced  on  the  logical  quantum  bits  is  only 
0(e_Ax10).  It  is  this  latter  number  that  does  not  change 
with  wire  length  which  makes  an  important  difference.  In 
the  previous  short- wire  scheme  we  could  not  make  a  wire 
longer  than  6//m.  Here  we  can  make  a  wire  of  nearly  arbi¬ 
trary  length.  For  example  a  wire  that  is  10  mm  long  has  a 
bandwidth  of  716,531  qbps,  while  a  simple  wire  has  an  ef¬ 
fective  bandwidth  of  zero  at  this  length  (for  computational 
purposes). 

The  situation  is  even  better  when  we  consider  latency. 
Unlike  the  simple  wire,  the  wire  architecture  we  propose 
allows  for  the  pre-communication  of  EPR  pairs  at  the  sus¬ 
tainable  bandwidth  of  the  wire.  These  pre-communicated 
EPR  pairs  can  then  be  used  for  transmission  with  a  constant 
latency.  This  latency  is  roughly  the  time  it  takes  to  per¬ 
form  teleportation,  or  about  «  20  /as.  Note  this  latency  is 
much  improved  compared  to  the  distance-dependent  sim¬ 
ple  wiring  scheme. 

5.2.1  Technology  Independent  Metrics 

Using  the  same  constants  defined  above  for  the  swapping 
channel,  we  can  generalize  our  analysis  of  teleportation 
channels.  The  latency  is  simply: 

latency  «  10T  (10) 

The  bandwidth  is: 

bandwidthtme  =  ^e~2XD  (11) 

Unlike  the  short  wire,  this  bandwidth  is  not  constrained 
by  a  maximum  distance  related  to  the  threshold  theorem 
since  teleportation  is  unaffected  by  distance.  The  commu¬ 
nication  of  EPR  pairs  before  teleportation,  however,  can  be 
affected  by  distance,  but  at  a  very  slow  rate.  While  purifi¬ 
cation  must  discard  more  corrupted  EPR  pairs  as  distance 


increases,  this  effect  is  orders -of-magnitude  smaller  than 
direct  data  transmission  over  short  wires  and  is  not  a  fac¬ 
tor  in  an  practical  silicon  of  up  to  10’s  of  millimeters  on  a 
side. 


6  System  Bandwidth 

Our  goal  has  been  to  design  a  reliable,  scalable  quantum 
communication  layer  that  will  support  higher-level  quan¬ 
tum  error  correction  and  algorithms  functioning  on  top  of 
this  layer.  A  full  description  of  error  correction  and  quan¬ 
tum  algorithms  is  beyond  the  scope  of  this  paper.  A  key 
issue  for  future  evaluation,  however,  is  that  the  lower  la¬ 
tency  of  our  teleportation  channel  actually  translates  to 
even  higher  bandwidth  when  the  upper  layers  of  a  quantum 
computation  are  considered.  It  is  for  this  reason  that  long 
wires  should  not  be  constructed  from  chained  swapping- 
channels  and  quantum  “repeaters”. 

The  intuition  behind  this  phenomenon  is  as  follows. 
Quantum  computations  are  less  reliable  than  any  compu¬ 
tation  technology  that  we  are  accustomed  to.  In  fact,  quan¬ 
tum  error  correction  consumes  an  enormous  amount  of 
overhead  both  in  terms  of  redundant  qubits  and  time  spent 
correcting  errors.  This  overhead  is  so  large  that  the  relia¬ 
bility  of  a  computation  must  be  tailored  specifically  to  the 
run  length  of  an  algorithm.  The  key  is  that,  the  longer  a 
computation  runs,  the  stronger  the  error  correction  needed 
to  allow  the  data  to  survive  to  the  end  of  the  computation. 
The  stronger  the  error  correction,  the  more  bandwidth  con¬ 
sumed  transporting  redundant  qubits.  Thus,  lower  latency 
on  each  quantum  wire  translates  directly  into  greater  effec¬ 
tive  bandwidth  of  logical  quantum  bits.  For  more  informa¬ 
tion  on  quantum  error  correction  and  algorithms,  we  refer 
the  reader  to  [31]. 


7  Conclusion 

Our  study  has  focused  on  a  critical  aspect  of  any  quantum 
computing  architecture,  quantum  wires  to  transport  quan¬ 
tum  data.  Building  upon  key  pieces  of  quantum  technol¬ 
ogy,  we  have  provided  an  end-to-end  look  at  a  quantum 
wire  architecture.  We  have  shown  that  our  teleportation 
channel  scales  with  distance  and  that  swapping  channels 
do  not.  We  have  also  discovered  fundamental  architectural 
pressures  not  previously  considered.  These  pressures  arise 
from  the  need  to  co-locate  physical  phenomena  at  both  the 
quantum  and  classical  scale.  Our  analysis  indicates  that 
these  pressures  will  force  architectures  to  be  sparsely  con¬ 
nected,  resulting  in  coarser-grain  computational  compo¬ 
nents  than  generally  assumed  by  previous  quantum  com¬ 
puting  studies.  We  believe  that  further  architectural  studies 
of  this  nature  will  be  valuable  in  identifying  the  research 
challenges  facing  quantum  technologies  of  the  future. 
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Abstract: 

The  theoretical  study  of  quantum  computation  has  yielded 
efficient  algorithms  for  some  traditionally  hard  problems. 
Correspondingly,  experimental  work  on  the  underlying  phys¬ 
ical  implementation  technology  has  progressed  steadily. 
However,  almost  no  work  has  yet  been  done  which  explores 
the  architecture  design  space  of  large  scale  quantum  com¬ 
puting  systems.  In  this  paper,  we  present  a  set  of  tools  that 
enable  the  quantitative  evaluation  of  architectures  for  quan¬ 
tum  computers. 

The  infrastructure  we  created  comprises  a  complete  com¬ 
pilation  and  simulation  system  for  computers  containing 
thousands  of  quantum  bits.  We  begin  by  compiling  complete 
algorithms  into  a  quantum  instruction  set.  This  ISA  enables 
the  simple  manipulation  of  quantum  state.  Another  tool  we 
developed  automatically  transforms  quantum  software  into 
an  equivalent,  fault-tolerant  version  required  to  operate  on 
real  quantum  devices.  Next,  our  infrastructure  transforms 
the  ISA  into  a  set  of  low-level  micro  architecture  specific  con¬ 
trol  operations.  In  the  future,  these  operations  can  be  used  to 
directly  control  a  quantum  computer.  For  now,  our  simula¬ 
tion  framework  quickly  uses  them  to  determine  the  reliability 
of  the  application  for  the  target  micro  architecture. 

Finally,  we  propose  a  simple,  regular  architecture  for  ion- 
trap  based  quantum  computers.  Using  our  software  infras¬ 
tructure,  we  evaluate  the  design  trade  offs  of  this  micro  ar¬ 
chitecture. 

1  Introduction 

Experimental  research  into  quantum  computing  technologies 
has  been  progressing  at  a  steadily.  Demonstrations  of  bulk- 
spin  NMR  computers  [1],  ion-trap  based  designs  [2,  3,  4], 
and  optical  cavity  wells  [5,  6]  for  quantum  computation  have 
been  performed.  The  next  step  in  this  area  is  to  scale  up 
from  experimental  quantum  computers  consisting  of  a  hand¬ 
ful  of  quantum  bits  to  large  scale  quantum  computing  sys¬ 
tems.  Clearly  many  technological  hurdles  still  exist,  and  one 
of  the  most  basic  is  the  architectural  design  of  these  systems. 

Why  worry  about  the  architecture  of  a  quantum  computer 
now?  The  most  promising  technologies  are  at  least  five  years 
from  demonstrations  of  a  dozen  qubits  or  more,  and  large 
scale  systems  are  not  even  seriously  on  the  drawing  board. 


Architects,  however,  can  make  significant  contributions  by: 
(1)  identifying  the  serious  practical  difficulties  that  will  arise 
from  the  physical  structure  of  these  devices  and  (2)  finding 
solutions  to  these  and  other  challenges  with  the  technology. 

Identifying  the  challenges  these  systems  face  allows  de¬ 
vice  physicists  and  quantum  theorists  to  start  exploring  po¬ 
tential  solutions.  By  understanding  the  challenges  facing 
the  practical  implementation  of  these  technologies,  archi¬ 
tects  can  find  solutions  through  the  proper  organization  of 
the  structure  of  these  devices.  Collectively,  what  this  means 
is  computer  architects  have  the  potential  to  hasten  the  devel¬ 
opment  of  a  large  scale  quantum  computer  sooner  rather  than 
later  by  identifying  and  solving  scalability  problems  early. 

Where  to  begin  with  quantum  architecture  research?  Sim¬ 
ilar  to  classical  architecture,  one  begins  with  the  applica¬ 
tions.  Surprisingly,  even  though  it  will  be  some  time  before 
a  quantum  computer  is  built  the  application  that  computer 
will  execute  is  already  well  known:  error  correction.  Quan¬ 
tum  technologies  will  operate  with  error  rates  far  higher  than 
classical  machines.  Experimental  error  rates  of  IE— 3  per  bit 
operation  have  been  measured  in  NMR  systems  [1].  Techno¬ 
logical  advances  are  expected  to  lower  these  rates  dramati¬ 
cally,  but  reaching  IE— 10  -  IE— 12  is  considered  highly 
aggressive.  There  is  only  one  way  to  manage  these  errors  in 
a  quantum  computer:  utilizing  software  error  correction  on 
a  well-designed  quantum  computer  architecture. 

Our  research  efforts  have  been  devoted  to  developing 
these  architectures.  To  conduct  this  research  in  a  quantitative 
fashion,  we  developed  an  infrastructure  consisting  of  compi¬ 
lation  and  modeling  tools.  This  paper  will  spend  significant 
time  describing  these  software  artifacts  (Sections  3-  7)  be¬ 
cause  the  methodology  for  applying  architectural  principles 
to  quantum  computers  is  one  of  the  primary  contributions 
of  this  work.  All  existing  work  on  quantum  architectures 
has  produced  either  hand-designed  circuits  without  consid¬ 
erations  for  scalability  [7]  or  analytical  models  for  perfor¬ 
mance  and  reliability  that  are  unable  to  scale  to  systems  large 
enough  to  solve  real-world  problems  [8] . 

We  rely  on  appropriate  technological  abstractions  and 
careful  design  of  the  ISA,  scheduler,  and  simulator  to  con¬ 
struct  an  infrastructure  that  scales  (linearly)  to  thousands  of 
qubits  and  billions  of  time  steps.  Briefly,  our  tool  chain  is 
the  following: 
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•  A  compiler  from  an  existing  high-level  language  which 
enables  the  manipulation  of  quantum  bits  to  an  instruc¬ 
tion  set  architecture  we  developed  for  quantum  comput¬ 
ers. 

•  An  error  correction  compiler  that  automatically  trans¬ 
forms  a  quantum  ISA  assembly  text  into  equivalent 
fault  tolerant  versions. 

•  A  device  scheduler  that  maps  an  assembly  source  into 
a  set  of  device  specific  primitive  operations  for  control¬ 
ling  a  quantum  micro- architecture. 

•  A  simulator  that  models  the  reliability  of  the  quantum 
bits  in  a  quantum  computer;  performance  and  reliabil¬ 
ity  metrics  for  an  application  running  on  a  targeted  mi¬ 
croarchitecture  can  be  obtained  by  using  this  simulator. 

Our  tools  enable  researchers  to  explore  architectural  trade¬ 
offs  directly.  Instead  of  using  high-level  models  or  mathe¬ 
matical  equations  to  calculate  execution  time  and  reliability 
our  infrastructure  provides  the  proper  compilation,  schedul¬ 
ing,  and  simulation  tools  to  compute  these  results  precisely. 

Using  these  tools,  in  Section  8  we  evaluate  a  few  quantum 
micro-architectures  as  they  perform  error  correction  steps. 
We  find  that  the  realistic  constraints  exposed  by  execution 
on  a  microarchitecture  significantly  decrease  the  acceptable 
error  rates.  Idealized  theoretical  models  set  the  critical 
threshold  -  above  which  sustainable  quantum  computation 
is  not  possible  [9]  -  at  approximately  IE— 4,  but  a  threshold 
which  accounts  for  the  constraints  of  the  proposed  micro¬ 
architecture  is  closer  to  IE— 9.  Our  results  indicate  that 
more  than  4/5  of  this  difference  can  be  accounted  for  by  re¬ 
source  contention  and  the  impact  of  ion  movement  and  turn¬ 
ing  in  a  real  system.  Since  architects  excel  at  the  exploitation 
of  locality  and  the  minimization  of  resource  contention,  this 
suggests  that  through  intelligent  design,  architects  have  the 
potential  to  have  a  major  impact  on  the  accuracy  of  quantum 
computation  thus  allowing  us  to  achieve  a  scalable  quantum 
computer  sooner  rather  than  later. 

The  remainder  of  this  paper  is  structured  in  a  logical  pro¬ 
gression.  In  Section  2  we  describe  the  abstractions  we  use  to 
make  quantum  architectures  accessible.  Section  3  presents 
an  overview  of  our  software  infrastructures.  The  ISA  we 
developed  is  described  in  Section  4.  Sections  6,  5  and  7 
elaborate  on  the  design  of  our  device  scheduler,  error  cor¬ 
rection  compiler,  and  simulator.  In  Section  8  we  present  the 
result  from  our  exploration  of  a  simple  tile-based  quantum 
microarchitecture.  In  Section  9  we  describe  where  to  go  next 
with  this  work  and  in  Section  10  conclude. 

2  Technology  abstraction 

The  science  of  architecture  is  the  optimization  of  the  hard¬ 
ware  /  software  interface.  The  nuts  and  bolts  of  it  is  exam¬ 


ining  applications,  working  with  the  realistic  constraints  of 
the  technology,  and  developing  software  infrastructures  and 
hardware  designs.  Research  into  architectures  for  quantum 
computers  is  no  different.  To  design  architectures,  a  rea¬ 
sonable  abstraction  for  the  underlying  technology  and  un¬ 
derstanding  of  the  software  applications  is  required.  In  this 
section,  we  describe  a  basic  set  of  abstractions  for  ion  trap 
based  quantum  computing  technology.  We  will  discuss  the 
application  characteristics  further  in  Section  4. 

We  focus  our  attention  on  ion  trap  based  designs  because 
they  appear  to  be  the  most  promising  in  terms  of  a  near  term 
ability  to  deliver  a  system  with  10’s  to  100’s  of  qubits.  The 
cost  of  these  systems  will  not  be  insignificant  with  estimates 
in  the  hundreds  of  millions  of  dollars  to  develop  a  single 
prototype.  Proper  engineering  of  their  architectural  design 
ahead  of  time  will  be  required  to  maximize  their  scientific 
and  national  infrastructure  value. 

For  architectural  design,  we  focus  on  three  circuit  compo¬ 
nents:  ions,  traps  and  wires,  as  depicted  in  Figure  1.  Ions 
are  the  entities  that  realize  qubits.  The  excitation  state  of  the 
outer  electron  on  a  9BE+  ion  is  the  actual  quantum  prop¬ 
erty  used  to  realize  a  qubit  [3,  4].  A  trap  is  a  device  that  uses 
classical  support  circuitry  and  lasers  to  perform  quantum  op¬ 
erations  on  ions.  This  gives  it  a  multi-purpose,  ALU-like 
functionality.  Quantum  operations  can  only  be  performed 
on  ions  that  are  located  in  traps.  Inside  of  the  trap,  any  ar¬ 
bitrary  single  qubit  operation  and  a  limited  number  of  two 
qubit  operations  including  CNOT  and  controlled  rotation  can 
be  performed.  For  the  two  qubit  operations,  both  ions  must 
be  located  in  the  same  trap.  Wires  are  just  two  sided  struc¬ 
tures  within  the  design  in  which  ions  can  move.  Wires  can 
contain  corners  but  care  must  be  taken  when  moving  ions 
in  anything  other  than  a  straight  line.  Ions  must  move  adi- 
abatically  (read:  slowly)  around  corners  or  an  unrepairable 
amount  of  noise  will  be  introduced. 

While  the  precise  timing  of  all  operations  is  obviously 
not  known  yet  -  it  is  technology  specific  and  will  change 
as  the  systems  evolve,  the  relative  timing  between  them,  ob¬ 
served  from  [3,  4],  is  roughly:  moving  1  unit  within  a  wire 
is  l/10th  a  time  step;  performing  a  single  qubit  operation,  1 
time  step;  performing  a  two  qubit  operation,  10  time  steps; 
turning  a  comer  including  getting  into  and  out  of  a  trap,  100 
time  steps.  Architects  should  think  of  the  single-qubit  op¬ 
erations  as  the  “clock  cycle”  of  the  machine.  The  classical 
analogy  is  that  these  operations  are  simple  and  fast,  like  an 
addition.  Measurement  and  two-qubit  operations  are  slow, 
just  like  complex  classical  functions  such  as  divide.  Later  in 
Section  5  we  will  present  statistics  for  the  relative  instruction 
mix  between  single/two  qubit  operations  and  measurement. 

The  basis  unit  for  these  time  steps  is  «  1  us.  For  single- 
and  two-qubit  operations,  this  will  not  change,  as  it  is  a  fun¬ 
damental  property  of  the  ions  [3,  4]  used  to  realize  qubits. 
For  movement  and  turning,  it  is  a  function  of  the  technol¬ 
ogy,  and  as  this  develops,  they  may  become  faster.  Moving, 
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Moving  qubits  around  corners 
is  significantly  slower  (greater  than 
lOOx)  than  going  in  a  straight  line. 

Operations  are  performed  inside  of  traps.  For  two  qubit 
operations,  the  qubits  must  be  in  the  same  trap. 

Figure  1:  Technological  abstraction  of  ion  trap  based  quantum  computers. 


Good  Bad 

Figure  2:  Basic  design  rules  for  ion  trap  systems. 


and  in  particularly  turning,  induces  noise  (from  heat)  on  the 
qubit  and  must  be  performed  slowly  so  that  the  state  does 
not  decohere.  Current  experimental  work  aims  for  turning 
to  be  50-300  times  slower  than  single-qubit  operations  [3]. 
Since  controlling  noise  is  so  important  for  quantum  archi¬ 
tectures  we  do  not  use  a  highly  aggressive  turning  time  for 
our  simulations.  We  do,  however,  explore  the  impact  of  this 
parameter  on  performance  in  Section  8. 

Our  review  of  current  ion-trap  based  designs  [3,  7]  sug¬ 
gests  a  few  simple  design  rules  that  must  be  observed  by  ar¬ 
chitects.  These  rules  are  the  quantum  analog  of  VLSI  design 
rules: 

•  Ion  traps  may  only  abut  one  or  two  wires 

•  Ion  traps  may  not  share  any  sides 

•  Ion  traps  may  not  abut  the  end  of  a  wire 

These  rules  are  depicted  in  Figure  2.  They  serve  as  an 


additional  level  of  abstraction  by  removing  the  need  to  con¬ 
sider  the  exact  sizing  and  space  tolerances  for  layouts.  Later, 
in  Section  8,  we  will  explore  a  simple  regular  architecture 
that  observes  these  design  constraints. 

3  Software  overview 

To  evaluate  complex  conventional  systems,  architects  utilize 
a  variety  of  software  tools.  Starting  with  a  (hopefully)  repre¬ 
sentative  set  of  applications,  they  compile  and  execute  them 
on  sophisticated  simulation  infrastructures  that  model  differ¬ 
ent  points  in  the  design  space.  To  properly  study  large  scale, 
quantum  computers  we  created  a  corresponding  infrastruc¬ 
ture.  This  infrastructure  is  comprised  of  four  major  compo¬ 
nents:  a  source  compiler,  an  error  correction  compiler,  a  de¬ 
vice  scheduler,  and  a  simulator.  In  this  section,  we  describe 
what  these  tools  do  and  how  they  are  used.  In  the  next  few 
sections  we  elaborate  more  on  how  they  work.  Figure  3  con¬ 
tains  a  pictorial  overview  of  the  flow  of  information  through 
the  tools. 

Source  compiler:  To  describe  quantum  algorithms,  we 
utilize  the  existing  QCL  [10]  work.  The  QCL  toolkit  pro¬ 
vides  an  interpreter  for  a  fairly  straightforward  imperative 
programming  language  that  includes  data  types  and  opera¬ 
tion  primitives  for  quantum  operations.  We  did  not  extend 
this  work  significantly  except  to  make  minor  changes  to  per¬ 
form  loop  unrolling  and  output  instructions  in  the  instruction 
set  described  in  Section  4. 

To  allow  for  aggressive  code  optimization  we  require  the 
input  to  applications  at  compile  time.  The  resulting  assem¬ 
bly  output  from  the  compiler  contains  only  the  operations  re¬ 
quired  to  perform  the  algorithm  on  the  provided  input  data. 
This  may  seem  limiting,  but  two  related  reasons  motivate  this 
design  choice.  First,  our  expectation  is  that  the  time  required 
for  a  quantum  computer  to  execute  an  algorithm  will  be  sig¬ 
nificantly  longer  than  the  time  required  to  optimize  resource 
usage  for  a  particular  algorithm/input  dataset  combination. 
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Figure  3 :  Quantum  architecture  research  infrastructure.  This  set  of  tools  enables  architects  to  start  with  a  high  level  language  description  of  an  algorithm 
and  a  microarchitecture  and  compile,  add  fault  tolerant  steps,  schedule  for  the  architecture,  and  simulate  the  speed  and  reliability  of  the  algorithm. 


Stated  another  way,  there  will  be  sufficient  gains  in  execu¬ 
tion  time  to  spend  significant  time  “up  front”  optimizing  re¬ 
source  usage.  The  second  reason  is  that  error  correction  in¬ 
curs  a  high  overhead,  suggesting  that  it  should  be  applied  as 
minimally  as  possible.  More  general  representations  require 
more  general  computation  and  hence  more  error  correction, 
while  an  executable  targeted  to  only  a  single  input  dataset 
can  be  optimized  aggressively  for  just  that  dataset.  Similar 
findings  have  been  reported  in  classical  computing,  where 
dynamic  optimizers  aggressively  tailor  executables,  folding 
in  constants,  etc  [11,  12,  13]. 

Error  correction  compiler:  The  output  of  our  source 
compiler  is  an  assembly  text.  This  assembly  assumes  an  ide¬ 
alized  machine  -  one  with  no  errors.  This  is  not  true  at  all  - 
quantum  computers  will  have  error  rates  between  IE— 6  and 
IE— 10  per  operation.  To  counteract  this,  researchers  dis¬ 
covered  and  explored  many  different  types  of  quantum  error 
correction  [14,  15,  16,  17].  For  our  purpose,  we  selected  the 
7  qubit  Steane  code  [14]  and  the  recursive  construction  pro¬ 
cess  described  in  [9].  The  error  correction  compiler  inputs 
the  assembly  text  that  assumed  an  ideal  computer  and  an  “er¬ 
ror  correction  strength”  level  and  outputs  another  assembly 
text  that  is  the  same  algorithm  except  with  fault  tolerant  con¬ 
structs  included.  This  output  text  is  considerably  larger  - 
potentially  by  several  orders  of  magnitude  -  but  is  required 
to  coax  the  right  answer  from  an  otherwise  noisy  quantum 
device. 

Scheduler:  The  next  step  in  the  tool  chain  is  to  schedule 
the  resources  of  the  quantum  computer.  For  classical  com¬ 
puting  devices,  the  schedule  is  implicit  in  the  executable  - 
the  semantics  of  von  Neumann  machines  are  sequential.  For 
quantum  computers,  sequential  semantics  are  maintained, 
but  the  importance  of  exploiting  parallelism  increases  dra¬ 
matically.  Ignoring  parallelism  in  a  von  Neumann  machine 
results  in  a  longer  execution  time,  but  the  computed  result 
does  not  change.  In  a  quantum  computer,  ignoring  paral¬ 
lelism  could  result  in  a  wrong  answer.  Thus  our  scheduler 


takes  in  an  assembly  text  and  a  description  of  the  microar¬ 
chitecture  of  the  quantum  computer  and  creates  a  parallel 
schedule  of  operations  that  should  be  performed  on  the  ac¬ 
tual  microarchitecture. 

Simulator:  Once  the  application  is  scheduled  onto  the 
physical  resources  of  the  machine,  the  next  step  in  the  tool 
chain  is  to  decide  whether  or  not  the  application  will  actu¬ 
ally  work.  Too  little  error  correction  or  a  poor  schedule  will 
produce  noise  instead  of  the  correct  answer.  The  purpose 
of  this  step  is  to  determine  how  reliable  the  scheduled  ap¬ 
plication  will  be  on  the  device.  If  the  simulator  determines 
the  schedule  will  be  reliable  then  we  are  done.  The  end  re¬ 
sults  are  two  facts:  how  fast  the  algorithm  executed  on  the 
microarchitecture  and  how  reliable  the  result  was.  If  the  re¬ 
sult  is  determined  to  be  unreliable,  the  user  has  to  back  up 
two  steps  and  add  more  error  correction  or  model  a  different 
microarchitecture  that  might  perform  better. 

A  schedule  that  shows  a  high  rate  of  reliability  under  sim¬ 
ulation  is  detailed  enough  to  control  the  physical  computer 
during  the  execution  of  the  algorithm  and  dataset.  This  step 
is  beyond  the  scope  of  this  work,  but  basically,  it  involves 
translating  the  schedule  using  a  fairly  straightforward  map¬ 
ping  between  operation  steps  and  the  pulses  that  control  the 
actual  quantum  computer. 

4  Instruction  set  architecture 

The  design  of  an  instruction  set  architecture  (ISA)  encom¬ 
passes  many  different  pieces.  The  most  fundamental  is  the 
execution  model,  which  describes  how  a  machine  will  pro¬ 
cess  a  group  of  instructions.  Next  are  the  resources  available 
in  the  machine,  typically  memories  and  their  interface.  Fi¬ 
nally,  there  are  the  actual  instructions  themselves.  Figure  4 
describes  the  ISA  we  designed. 

The  ISA  we  describe  here  is  a  “high-level  ISA”  which  is 
not  directly  executable  by  any  quantum  computer.  These 
IS  As  have  also  been  referred  to  as  “virtual  IS  As”  [18]  and 
linear  intermediate  representations.  The  purpose  of  this  ISA 
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is  to  provide  a  workable  representation  of  an  application.  By 
workable,  we  mean  that  it  has  relatively  straightforward  se¬ 
mantics,  tools  can  process  it  largely  piecemeal  operation-by¬ 
operation,  and  it  can  be  translated  in  a  direct  way  to  the  ac¬ 
tual  control  sequences  a  quantum  computer  requires. 

Execution  model:  We  base  our  ISA  on  the  von  Neumann 
execution  model.  This  means  that  conceptually,  the  quan¬ 
tum  computer  can  be  thought  to  fetch,  decode,  and  execute 
the  primitive  operations  one-by-one.  This  design  choice  is 
motivated  by  an  additional  restriction  we  place  on  execu¬ 
tion:  Quantum  programs  may  not  contain  branches.  All 
loops  must  be  fully  unrolled,  and  all  conditionals  must  be 
converted  into  predicates,  (see  Figure  5). 

The  reason  we  chose  to  restrict  applications  in  this  way 
is  that  it  enables  our  software  infrastructure  to  provide  de¬ 
velopers  with  concrete  reliability  results.  Since  there  are  no 
branches  in  the  compiled  binary,  every  instruction  must  be 
scheduled  onto  a  quantum  micro-architecture.  Once  sched¬ 
uled,  our  simulator  can  provide  a  very  precise  answer  to  the 
question,  “Will  it  work?” 

If  branches  were  part  of  the  ISA,  then  answering  that  ques¬ 
tion  would  no  longer  be  possible.  The  schedule  of  low-level 
operations  could  vary  significantly  from  execution-run  to 
execution-run  based  upon  branch  outcomes,  which  in  quan¬ 
tum  software,  depends  largely  on  random  noise  the  system 
experiences  and  corrects  for.  These  variances  make  it  far 
more  difficult  to  predict  reliably  whether  or  not  the  schedule 
will  actually  compute  correctly. 

Resources:  In  our  high-level  ISA,  we  assume  an  infi¬ 
nite  number  of  quantum  and  classical  memory  locations  are 
available.  Memory  is  split  into  two  segments,  a  quantum 
segment  and  a  classical  segment.  Quantum  bits  (qubits)  are 
referred  to  as  qNamel ,  qName 2, ...,  while  classical  bits  are 
referred  to  as  cNamel ,  cName 2, ....  Since  this  is  a  high- 
level  ISA,  there  is  no  need  to  restrict  the  name  of  bits  to  sim¬ 
ple  numerical  addresses  as  a  simple  compilation  pass  prior  to 
scheduling  can  assign  device  specific  addresses  and  resolve 
any  false  dependencies  caused  by  name  reuse.  We  do  not  use 
a  hierarchical  memory  (i.e.  there  is  no  distinction  between 
memory  and  registers). 

Operations:  The  instruction  set  we  have  devised  operates 
on  both  classical  and  quantum  data.  The  classical  opera¬ 
tions  are  fairly  ordinary  and  encompass  a  straightforward  set 
of  opcodes  (logic,  arithmetic,  etc).  For  brevity,  we  do  not 
describe  them  in  detail  because  they  are  your  typical  three 
operand  RISC-like  ISA:  cOutput  =  clnputl  op  clnput2. 

The  quantum  opcodes  are  summarized  in  Figure  4.  These 
operations  provide  a  fairly  basic,  yet  complete  set  of  op¬ 
erations  for  manipulating  quantum  state.  There  are  many 
things  to  note  about  this  instruction  set.  First,  all  quan¬ 
tum  operations  (except  measurement)  are,  by  definition,  re- 


Instruction  set  format: 


[  @cond] 


op 


operands 


quantum  or  classical  memory 
locations  to  operate  on 


L  operation  to  perform 


optionally  perform  operation  only 
if  conditional  is  true 


Quantum  Operations: 


h 

qN 

Basic  quantum  primitives  such  as 
Hadamard  (H),  invert  (X),  invert  phase 
(Z),  arbitrary  rotation  (R),  and  phase 
gate  (S) 

X 

qN 

z 

qN 

rot 

qN,real 

s 

qN 

V 

qN , qC , real 

rotate  qN  about  X  axis,  conditional  on 
qC,  by  real 

cnot 

qN,qC 

flip  qN  conditional  on  qC 

swap 

qNl,qN2 

swap  qNI  and  qN2 

tof f oli 

qN,qCl,qC2 

flip  qN  conditional  on  qCI ,  qC2 

measure 

cT ,  qN 

measure  qN  place  result  in  cT 

Pseudo-operations: 


. exchange 

qNl,qN2 

move  qNI  ,qN2  together. 

(No  operation) 

.  new 

qN  |  CN 

allocate/deallocate  a  new  quantum  or 
classical  bit  under  name  N. 

.  free 

qN  |  cN 

Figure  4:  The  instruction  set  architecture  for  quantum  computers 


procedure  Example ()  { 
qureg  q[3]; 
int  m; 

Not (q[ 1 ] ) ; 
for  m=0  to  #q-l 
H(q|mJ)j 

} 

CNot (q[ 1 ] r q[ 0 ] ) ; 
measure  q[0),m; 
if  m==l 
Not (ql 1 ) ) ; 
else 

Not (q( 2  J ) ; 

) 

QCL  code  from  Omer 


.new 

.new 


predicate 

conversion 

predicate 

conversion 


H 

H 

CNot 

measure 

.free 


>@c0 

>eico 


X 

.free 


qO,  ql,  q2 

CO 

qi 

qO 

qi 

q2 

ql,  qO 
cO,  qO 


qO 

qi 

q2 

cO 


quanta*  «tat« 
d»«troy«d  by 


Compiled  assembly 


Figure  5:  The  compiler,  based  on  QCL  [10],  transforms  QCL  source  text 
into  assembly. 
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versible.  This  is  a  constraint  imposed  by  the  quantum  com¬ 
puting  model  itself.  One  implication  of  this  is  there  is  no 
distinction  between  input  and  output  operands.  Instead,  op¬ 
erations  transform  all  of  their  operands.  Second,  this  ISA  is 
a  balance  between  high-level  primitives,  such  as  multi-qubit 
complex  operations,  and  low-level  device- specific  controls. 
Our  guiding  principle  has  been  to  design  an  ISA  with  primi¬ 
tive  enough  operations  that  a  clear  1  :  N  mapping  exists  be¬ 
tween  the  operations  and  device  specific  control  sequences, 
but  high-level  enough  that  the  tool  infrastructure  could  ma¬ 
nipulate  a  useful  block  of  related  work. 


5  Error  correction  compiler 


op  1 


op  2 


op  3 


Op  4 


op  5 


qO  qt  q2 


Applying  op  2  lo  ql 

1  ql  ) 


-qO 


moving  qO  near  ql 
lor  op  3 


•t-N 


moving  q2  near  ql 
lor  op  5 


Figure  6:  The  goal  of  scheduling  is  to  transform  a  program  source  (left) 
into  a  sequence  of  primitive  operations  that  move  and  manipulate  ions  in  a 
quantum  computer  (right).  A  main  requirement  is  that  the  scheduler  oper¬ 
ate  in  O (instructions)  time  because  the  number  of  instructions  is  in  the 
billions  for  a  complete  fault  tolerant  run  of  Shor’s  algorithm. 


After  compilation  to  an  assembly  text,  the  next  step  is  to 
transform  the  application  into  a  fault-tolerant  version.  Fault 
tolerant  quantum  computing  is  done  in  pretty  much  the  same 
way  it  has  been  done  for  decades  in  the  classical  domain 
-  through  redundancy.  Each  quantum  bit  is  encoded  into  a 
logical  qubit.  Logical  qubits  utilize  several  physical  qubits 
to  store  a  coded  version  of  the  quantum  state.  Each  operation 
on  the  original  qubit  is  transformed  into  an  equivalent  set  of 
operations  on  the  qubits  that  make  up  the  logical  qubit.  The 
major  drawback  in  all  of  these  schemes  is  the  increase  in  the 
number  of  qubits  required. 

In  our  system,  we  utilize  the  7-qubit  Steane  code  [14] 
and  employ  the  recursive  error  correction  constructions  de¬ 
scribed  in  [9].  We  do  this  with  an  assembly  source  to  source 
translator.  This  translator  converts  a  compiled  quantum  ap¬ 
plication  into  an  equivalent  assembly  source  file  which  con¬ 
tains  the  embedded  error  correction  operations. 

The  precise  fault  tolerant  constructions  are  not  a  contribu¬ 
tion  of  our  work.  We  base  them  on  prior  work  [14,  9,  19] 
and  refer  the  interested  reader  there.  However,  to  the  best 
of  our  knowledge,  our  tool  is  the  first  to  apply  them  auto¬ 
matically  to  an  application,  accounting  for  all  of  the  required 
ancilla  preparation  work  and  at  multiple  strength  levels  (0,1, 
and  2  levels  of  error  correction).  Because  of  this,  we  have 
calculated  some  useful  statistical  properties  about  its  output. 

The  results  are  shown  in  Figure  5.  Architects  should  take 
note  of  the  overhead  in  both  time  and  space  introduced  by 
the  error  correction  processes.  The  critical  path  for  a  single¬ 
qubit  operation  with  one  layer  of  error  correction  (EC1)  is 
31  operations  long.  Only  1  of  those  is  devoted  to  actually 
performing  the  operation  on  the  logical  qubit.  The  rest  are 
devoted  to  the  fault  tolerant  correction  step.  More  realis¬ 
tically,  not  all  operations  will  be  conducted  in  parallel,  the 
overhead  will  be  substantially  higher,  and  stronger  levels  of 
error  correction  will  be  required.  This  sizable  overhead  is 
one  of  the  reasons  we  can  design  quantum  computing  archi¬ 
tectures  now  -  Amdahl’s  Law  [20]  suggests  quantum  com¬ 
puters  are  going  to  spend  all  of  their  time  error  correcting ! 


6  Device  scheduler 

Once  we  have  a  source  assembly  file  with  the  error  correc¬ 
tion  compiled  in,  the  next  step  is  produce  a  schedule  for  those 
operations  on  an  ion  trap  computer  micro-architecture.  Fig¬ 
ure  6  depicts  the  overall  goal:  given  a  source  assembly  text 
(represented  in  graph  form  on  the  left),  the  scheduler  pro¬ 
duces  the  parallel  sequence  of  low-level  operations  (right). 
In  this  section,  we  describe  the  scheduling  process. 

6.1  Input 

The  scheduler  takes  three  pieces  inputs:  the  source  assem¬ 
bly  text  to  be  scheduled,  a  description  of  the  architecture  to 
schedule  the  source  on,  and  a  description  of  the  technology 
parameters  and  constraints.  The  source  text  has  been  previ¬ 
ously  described  (Section  4). 

The  architecture  description  is  a  low-level  description  of 
the  ion  trap  layout.  As  a  classical  analogy,  this  is  at  the 
same  level  as  a  VLSI  layout  produced  with  tools  such  as 
Magic  [21].  The  description  includes  the  precise  X/Y  co¬ 
ordinates  of  ion-traps,  the  operations  each  trap  can  perform, 
and  their  interconnection  wiring. 

The  technology  parameters  provided  to  the  scheduler  con¬ 
tain  timing  information  for  all  device- specific  operations. 
This  includes  the  timing  of  all  operations  (x,  H,  CNOT, 
etc)  and  the  timing  for  moving  ions  around  the  computer. 
Specific  movement  parameters  are  included  for  moving  ions 
through  wires,  into  and  out  of  wires,  and  for  turning  corners. 

6.2  Scheduling  algorithm 

The  ability  to  process  billions  of  operations  was  paramount 
in  designing  the  scheduler.  Therefore,  we  chose  to  trade-off 
optimality  for  speed.  One  of  the  major  costs  in  scheduling 
is  determining  the  route  an  ion  should  take  to  travel  between 
traps  that  do  not  abut  the  same  wire.  This  problem  has  paral- 
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min 

time 
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max 

space 

no  EC 

- 

- 

1 

1 

1 

EC  1 

56.37% 

10.31% 

31 

447 

16 

96 

EC  2 

56.74% 

9.44% 

211 

217529 

58 

12456 

Table  1 :  Properties  of  error  correction:  In  this  table,  we  present  results  from  our  error  correction  compiler  and  scheduler.  The  first  two  columns 
indicate  the  percentage  of  two-qubit  CNOT  gates  and  measurement  operations.  The  next  two  columns  min/max  time  indicate  the  time  (in  ops)  required 
to  perform  the  error  correction  process.  Minimum  time  refers  to  doing  things  maximally  parallel  (no  architectural  constraints),  while  maximum  refers 
to  doing  all  operations  sequentially.  Minimum/max  space  refers  to  number  of  physical  qubits  required  to  perform  the  operation.  Minimum  space  comes 
from  doing  all  operations  sequentially  (scheduled  perfectly),  while  maximum  space  comes  from  doing  as  many  operations  in  parallel  as  possible,  reducing 
execution  time  but  increasing  resource  requirements. 


lels  in  the  routing  of  signals  between  logic  units  within  FP- 
GAs,  so  we  adapted  the  PathFinder  algorithm  [22]  to  create 
a  collection  of  efficient  paths  from  source  to  destination  in 
the  micro-architecture.  The  biggest  change  is  that  we  com¬ 
pute  5-10  paths  on  the  first  movement  between  a  source- 
destination  pair.  This  computation  only  occurs  once,  and 
subsequent  movements  simply  pick  the  best  path  from  those 
stored. 

The  next  step  is  to  parse  the  source  assembly  and  sched¬ 
ule  the  operations.  For  this  we  employ  a  variant  of  list¬ 
scheduling  [23].  First,  the  source  text  is  parsed  to  comple¬ 
tion  and  a  graph  representation  is  produced.  We  process  this 
graph  in  reverse  order,  starting  from  the  leaves,  and  proceed¬ 
ing  to  the  root(s).  By  applying  an  earliest-possible  greedy 
approach  in  reverse,  we  approximate  latest-possible  schedul¬ 
ing  if  run  forward  in  time.  From  the  view  of  the  simulator, 
qubits  allocated  only  when  absolutely  necessary,  allowing 
reuse  of  “scratch”  bits  and  attempting  to  minimize  the  time 
that  qubits  must  stay  coherent. 

At  any  given  time  point,  the  scheduler  maintains  a  list  of 
operations  that  can  be  scheduled  and  attempts  to  allocate  the 
physical  resources  of  the  machine  for  the  required  number 
of  time  steps.  In  the  case  of  operations,  this  means  simply 
holding  onto  the  ion  trap  for  that  time.  For  movement,  it 
means  referring  to  the  pre-computed  path  data  structure  and 
choosing  the  path  that  with  no  conflicts  for  the  time  required. 
Operations  that  cannot  be  scheduled  due  to  resource  conflicts 
are  simply  delayed  and  another  scheduling  attempt  is  made 
at  the  next  opportune  time  step. 

7  Simulator 

The  scheduler  produces  an  exact  set  of  command  sequences 
for  controlling  a  quantum  computer.  One  can  directly  read 
the  tail  end  of  this  schedule  to  determine  the  running  time  of 
the  application.  Of  critical  importance,  however,  is  whether 
or  not  the  qubits  will  contain  correct  values.  Noise  (decoher¬ 
ence)  could  have  corrupted  them  so  much  that  the  schedule 
will  not  produce  any  meaningful  result  from  a  quantum  de¬ 
vice. 


In  all  other  quantum  research  projects,  a  precise  physical 
level  simulator  of  the  device  is  used  to  determine  the  relia¬ 
bility.  In  our  study,  however,  we  are  interested  in  comput¬ 
ers  with  hundreds  to  thousands  of  qubits.  Since  the  running 
time  of  precise  simulation  is  exponential  in  the  number  of 
entangled  qubits,  the  number  of  qubits  that  can  be  simulated 
in  reasonable  time  with  current  technology  (clusters  of  ma¬ 
chines,  days  of  time)  is  in  the  low  30’s  [24].  Clearly,  this 
approach  will  not  work  for  100  -  100,000  qubits. 

Instead,  we  make  the  observation  that  if  you  do  not  care 
about  simulating  the  precise  state  of  a  quantum  computer, 
Monte  Carlo  simulation  can  be  used  to  produce  an  expected 
reliability  for  the  device.  With  Monte  Carlo  simulation,  the 
expected  probability  of  a  phenomenon  is  determined  by  per¬ 
forming  an  action  several  times  and  calculating  what  per¬ 
centage  of  the  time  the  phenomenon  in  question  occurs. 

In  our  case,  the  phenomenon  in  question  is  the  introduc¬ 
tion  of  error  into  an  ion’s  quantum  state.  To  perform  our 
simulation,  we  start  with  a  base  error  rate  for  each  step  of 
computation.  This  base  error  rate  represents  the  probability 
that  an  error  occurs  in  an  ion  at  each  time-step.  We  intro¬ 
duce  an  error  in  the  ion  when  our  pseudo-random  number 
generator  [25]  produces  a  result  less  than  this  base  error. 

Within  the  simulation,  errors  are  propagated  based  on  the 
dependencies  of  the  computation.  Once  an  ion  is  in  error, 
it  stays  in  error  and  introduces  error  on  any  other  ions  it  in¬ 
teracts  with.  The  only  exception  to  this  rule  is  when  error 
correction  is  applied.  Our  simulation  framework  models  the 
effect  of  the  error  correction  added  prior  to  scheduling.  Once 
an  error  correction  is  completed,  the  simulator  examines  the 
qubits  of  the  logical  code  word.  If  only  one  qubit  is  in  error, 
then  the  simulator  assumes  the  error  correction  process  fixed 
that  single  qubit  error.  If  two  or  more  qubits  are  in  error, 
it  propagates  the  error  and  assumes  all  qubits  of  that  code 
word  are  now  in  error  (the  upper  bound  of  the  effect  of  error 
correction  on  a  terminally  broken  code  word). 

Naturally,  the  effectiveness  of  Monte  Carlo  simulation  de¬ 
pends  on  the  randomness  of  the  pseudo-random  number  gen¬ 
erator  used.  For  this  purpose,  we  have  selected  a  random 
number  generator  based  on  bit-rotation  and  addition  which 
is  considered  particularly  well  suited  to  Monte  Carlo  simu- 
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8.1  Validation 


Vary  the  quantity  of 
interconnect  in  each  unit 


Vary  dimensions  of 
computing  substrate 

Figure  7:  Basic  tile  structure  (left)  and  substrate  micro  architecture 
(right).  The  key  parameters:  ion  trap  cells,  connectivity,  and  substrate  size 
are  varied  in  this  study. 

lation  [25]. 

The  simulator  can  be  used  to  quickly  ( 0(n )  in  the  number 
of  scheduled  operations)  determine  whether  or  not  a  sched¬ 
ule  and  micro-architecture  will  operate  accurately.  Since  the 
problems  targeted  by  quantum  computers  are  in  the  com¬ 
plexity  class  NP  (a  super-set  of  NP-complete),  despite  the 
fact  that  solutions  to  these  problems  are  hard  to  generate, 
verification  in  possible  in  polynomial  time.  This  means  that 
accuracy  need  only  be  around  90%  (since  incorrect  answers 
can  be  quickly  detected  and  the  process  re-run  if  necessary). 
The  implication  is  that  the  reliability  of  the  system  can  be 
determined  with  a  mere  100  trials  on  our  simulator. 

In  addition  to  a  quick  test  of  the  reliability  of  a  quan¬ 
tum  program  and  micro-architecture  pair,  the  simulator  can 
also  execute  an  arbitrary  number  of  trials  to  achieve  a  fine¬ 
grained  understanding  of  the  rate  of  error.  In  the  next  section 
of  this  paper,  we  use  this  technique  to  explore  variations  on 
a  canonical  quantum  micro- architecture  and  measure  their 
runtime  performance  and  critical  thresholds  [9] . 

8  Micro-architecture  exploration 

In  this  section,  we  use  our  infrastructure  to  explore  basic 
micro-architectural  trade-offs.  We  begin  by  first  validat¬ 
ing  the  simulation  model.  Next,  we  use  the  tools  to  ex¬ 
plore  trap  width  versus  wiring  density  in  a  simple  quantum 
micro- architecture.  Finally,  we  conclude  by  exploring  the 
differences  between  quantum  computing  theory  and  prac¬ 
tice,  which  highlights  both  the  challenges  for  future  tech¬ 
nology  development,  and  the  importance  of  architecture  to 
this  discipline. 


Since  these  tools  are  the  first  of  their  kind  and  our  simu¬ 
lation  methodology  is  a  novel  approach  to  modeling  relia¬ 
bility  in  quantum  systems,  some  form  of  validation  is  de¬ 
sired.  To  do  this,  we  produced  a  single  fault-tolerant  error 
correction  sequence.  This  was  scheduled  onto  an  architec¬ 
ture  and  processed  by  our  simulator.  The  parameters  the 
simulator  used  to  model  error  were  changed  such  that  mov¬ 
ing  ions  around  the  micro- architecture  occurred  in  zero  time, 
ions  that  were  not  being  operated  on  had  zero  chance  of  de¬ 
cohering,  and  CNOT  instructions  required  the  same  amount 
of  time  as  single-qubit  gates.  These  parameters  match  the 
theoretical  model  of  quantum  computing  that  is  used  in  the 
literature.  Doing  this,  we  found  the  critical  threshold  -  the 
maximum  error  per  operation  for  sustainable  fault  tolerant 
computation,  to  be  4 E  —  4.  This  is  exactly  in  line  with  what 
one  would  expect  from  the  theoretical  estimate  previously 
calculated  [9,  19]. 

8.2  Exploration 

A  basic  design  of  a  quantum  micro-architecture  is  depicted 
in  Figure  7.  The  concept  is  to  use  a  substrate  of  identical 
tiles.  This  design  has  two  basic  micro-architectural  knobs 
to  vary:  the  number  of  ion  traps  in  a  tile  and  the  amount  of 
wiring  between  tiles. 

To  explore  the  effects  of  these  two  parameters  on  execu¬ 
tion  time,  we  mapped  the  error-correction  (level  1)  process 
onto  varying  substrates  using  our  scheduler.  We  chose  lay¬ 
outs  that  provided  150  traps  total  and  organized  the  tiles  to 
be  as  square  as  possible. 

The  scheduler  is  non-deterministic  (being  based  on  a  syn¬ 
thesis  of  PathFinder  and  list  scheduler),  so  results  vary 
slightly  between  runs.  Therefore  we  execute  each  test  8 
times  and  average  the  results.  The  overall  results  are  shown 
in  Figure  8. 

The  results  show  three  interesting  trends.  First,  except  for 
the  smallest  design,  small  numbers  of  traps  per  tile  are  fa¬ 
vored.  Too  few  traps  and  scheduling  becomes  more  difficult. 
Ions  must  move  into  and  out  of  regions  too  often,  increas¬ 
ing  execution  time.  With  too  many  traps,  the  conflicts  over 
the  single  wire  into  the  tile  begins  to  counter  the  increased 
potential  for  intra-tile  movement.  With  larger  trap  numbers, 
the  ions  must  also  move  further,  leading  to  longer  execution 
times. 

Second,  beyond  2  traps  /  tile,  a  single  surrounding  wire 
(which  is  2  wires  between  ion  trap  complexes;  Figure  7) 
is  less  efficient  than  having  more  interconnect.  However, 
moving  from  2-3  surrounding  wires  provides  no  real  sav¬ 
ings.  Looking  carefully  at  each  trap  complex  configuration, 
there  is  a  corresponding  ideal  interconnect  width:  2  traps 
/  1  wire,  3  traps  /  2  wires,  5  traps  /  3  wires.  This  pairing 
arises  from  the  schedulers  ability  to  exploit  trap  resources 
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Traps  /  tile 

Figure  8:  Performance  of  error-correction  on  various  trap  configurations, 
from  designs  built  from  tiles  that  are  1  trap  high  and  1  wire  in  between,  to  9 
traps  and  3  wires. 


and  wire-resources,  and  it  makes  intuitive  sense  that  the  or¬ 
der  of  these  (more  traps  -  more  wires)  is  aligned  up  until  5 
traps.  Beyond  5,  the  resources  are  not  exploited  well  by  the 
scheduler  and  simply  increase  delay. 

The  final  observation  is  that  2  traps  /  tile  and  a  surrounding 
wire  per  tile  performs  best  on  average  for  this  single  appli¬ 
cation.  For  our  scheduler  algorithm  and  the  error  correction 
process,  this  design  point  minimizes  overall  length  of  travel 
for  ions  and  balances  trap  complex  size  against  interconnect 
size. 

8.3  Dealing  with  architectural  reality  in  quan¬ 
tum  computers 

We  conclude  our  study  by  examining  the  critical  threshold 
-  the  error  rate  above  which  error  correction  processes  will 
not  work.  In  the  past  [9],  theorists  have  estimated  this  thresh¬ 
old  using  an  overly  idealized  model  of  computation  that  did 
not  account  for  the  actual  microarchitecture  of  the  machine. 
Using  our  tools,  we  can  account  for  this. 

Figure  9  plots  the  reliability  of  fault-tolerant  operations 
as  various  technology  and  architectural  features  are  progres¬ 
sively  accounted  for.  The  x-axis  of  this  graph  is  the  rate  of 
error  for  a  single-qubit  gate.  The  y-axis  of  this  graph  depicts 
the  rate  of  error  for  the  qubit  measured  by  our  simulator. 
The  straight-line  depicts  the  rate  of  error  for  a  non-encoded 
non-fault  tolerant  single  qubit  operation.  The  x-axis  points 
at  which  the  other  curves  cross  this  line  are  their  critical- 
thresholds. 

The  first  line  (farthest  to  the  right)  is  the  theoretical  quan¬ 
tum  computing  model.  In  this  model,  there  is  no  accounting 
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Figure  9:  Observed  error  rates  for  different  technology  and  architectural 
assumptions. 


for  architecture  or  technology  implications  -  such  as  move¬ 
ment,  turns,  the  difference  between  single-  and  two-  qubit 
gates,  and  the  reality  that  a  quantum  state  naturally  deco¬ 
heres  with  time,  even  if  no  operation  is  performed  on  it. 
This  line  crosses  the  non-fault  tolerant  line  where  prior  liter¬ 
ature  [9,  19]  estimates  it  should. 

The  next  line  over,  architecture  only  alters  the  model  to 
begin  to  consider  the  implications  of  having  to  perform  error 
correction  in  a  real  micro- architecture.  For  this  calculation, 
the  impact  of  decoherence  from  having  to  wait  for  resources 
to  become  available  is  introduced. 

Next  comes  the  architecture  and  cnot  line.  This  line  de¬ 
picts  the  effects  of  the  micro-architecture  and  accounts  for 
the  fact  that  CNOT  gates  require  an  order  of  magnitude  more 
time  to  operate  than  single  qubit  gates. 

The  final  result,  everything  accounted  for ,  is  one  of  the 
main  results  of  our  work.  In  this  trial,  we  introduce  the  full 
impact  of  movement  and  turns.  We  found  that  when  operat¬ 
ing  on  an  actual  micro- architecture  and  accounting  for  all  of 
the  implications  of  scheduling,  resource  conflicts,  the  cost  of 
moves  and  turns  and  single  versus  two-qubit  gates,  the  true 
threshold  lies  at  ~  IE  —  9.  This  is  lower  than  the  theoretical 
calculation  by  5  orders  of  magnitude. 

An  important  observation  from  this  data  is  that  of  these 
5  orders  of  magnitude  in  difference  between  the  theoretical 
model  and  the  actual  implementation,  3  of  these  are  the  re¬ 
sult  of  movement  and  turning,  «  1.5  are  the  result  of  basic 
resource  contention  and  only  «  1  /2  is  the  result  of  the  in¬ 
creased  cost  of  binary  operations  such  as  CNOT. 

The  implication  of  this  is  that  improving  the  accuracy  of 
individual  quantum  operations  will  only  have  a  minimal  im¬ 
pact  on  the  overall  accuracy  of  quantum  computation.  In¬ 
stead,  our  work  indicates  that  physicists  should  focus  on  re¬ 
ducing  the  error  rate  and  improving  the  execution  time  for 
turns,  while  architects  can  make  a  major  contribution  by  de- 
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signing  micro-architectures  and  schedulers  that  capitalize  on 
locality  to  decrease  the  need  for  movement  and  allow  for  the 
efficient  utilization  and  placement  of  resources  to  decrease 
contention.  In  this  way,  architects  can  raise  the  practical 
threshold.  Otherwise,  it  really  will  be  later  rather  than  sooner 
before  quantum  computing  is  a  reality. 

9  Future  work 

There  is  much  work  left  to  do  on  quantum  architectures. 
Right  now,  and  for  the  foreseeable  future,  the  goal  of  this 
work  should  be  to  reduce  the  critical  threshold.  What  we 
have  presented  in  this  paper  is  a  set  of  tools  and  architec¬ 
tural  analyses  that  show  the  real  threshold  is  ~  IE  —  9.  At 
least  two  and  perhaps  as  much  as  three  orders  of  magnitude 
of  this  threshold,  however,  are  due  to  the  micro-architecture 
and  tool  chain  infrastructure.  We  will  elaborate  below  on 
ways  to  reduce  this  threshold: 

Better  error  correction  processes:  Our  current  infras¬ 
tructure  utilizes  the  error  correction  steps  described  in  [19]. 
More  complex,  but  parallel  steps  are  known  [26].  Chang¬ 
ing  the  front  end  of  the  tool  chain  to  utilize  these  alternative 
constructions  could  reduce  by  about  1/3  the  minimum- time 
component  in  Table  5.  This  is  at  the  expense  of  more  com¬ 
plex  ancilla. 

Dynamically  adding  teleportation-channels:  In  [27]  the 
authors  describe  an  alternative  way  to  move  quantum  state 
around  a  large  micro-architecture.  Exploiting  these  telepor¬ 
tation  channels  instead  of  direct  movement  where  appropri¬ 
ate  could  further  parallelize  the  operations  involved  in  mov¬ 
ing  quantum  state  about. 

Better  micro-architectures:  For  this  paper  we  did  not  ex¬ 
tensively  study  micro-architecture  designs.  Our  goal  was 
more  on  the  front  end  in  creating  all  of  the  tools  required 
to  really  study  micro-architectures.  Thus,  the  very  next  step 
seems  to  be  to  design  architectures  that  are  better  able  to  ex¬ 
ploit  parallelism  within  the  error  correction  processes. 

Smarter  scheduling:  Our  current  scheduler  is  essentially 
a  greedy  algorithm  with  a  bounded  window.  Perfect  schedul¬ 
ing  is  NP-hard.  There  is  a  middle  ground.  Right  now  the 
scheduler  is  micro-architecture  agnostic.  It  can  schedule 
any  set  of  quantum  algorithms  onto  any  micro- architecture. 
Making  the  scheduler  more  micro- architecture  and  error- 
code  aware  seems  a  rich  area  for  performance  gains.  For 
example,  qubits  are  often  operated  on  in  repetitive  ways. 
Having  efficient  (perhaps  hand-done)  schedules  for  these 
common-case  operations  that  the  scheduler  could  draw  upon 
to  create  a  larger  application  schedule  seems  a  viable  ap¬ 
proach. 


Hierarchical  simulation:  Currently,  our  simulator  is  pes¬ 
simistic.  It  is  akin  to  an  automated  “counting”  simulator 
(used  to  count  point  of  failure).  If  the  actual  device  had  the 
technology  characteristics  specified  it  would  be  more  reli¬ 
able  when  executing  the  application.  How  much  more  re¬ 
liable  is  not  yet  known,  but  it  is  speculated  that  it  is  per¬ 
haps  as  much  as  an  order  of  magnitude.  The  simulator  can 
be  made  more  precise  by  integrating  a  precise  device-level 
physics  simulator  and  grouping  operations  into  large  units. 
These  units  can  be  modeled  precisely  using  the  device  sim¬ 
ulator  and  then  their  reliability  parameters  integrated  using 
the  counting  approach  of  our  existing  framework. 


10  Conclusion 

In  this  paper,  we  described  our  work  in  designing  an  instruc¬ 
tion  set  architecture,  compiler,  device  scheduler  and  simu¬ 
lator  for  ion  trap  based  quantum  computers.  Many  design 
choices  in  each  of  these  components  were  made  to  make 
them  scale  to  real  application  sizes.  Among  them:  the  tools 
compile-in  the  input  dataset  and  fully  unroll  all  loops  so  that 
the  scheduler  and  simulator  can  provide  concrete  results;  the 
error  correction  compiler  automatically  transforms  arbitrary 
input  programs  into  fault  tolerant  versions;  the  scheduler 
combines  techniques  from  FPGA/CAD  synthesis  and  tradi¬ 
tional  processor  compilers;  finally,  the  simulator  efficiently 
models  errors  instead  of  quantum  state  in  order  to  quickly 
provide  reliability  information. 

Using  these  tools,  architects  can  design  and  quantitatively 
evaluate  large  scale  architectures.  In  the  past,  quantum  re¬ 
searchers  have  had  to  make  careful  analytical  models  for 
system  reliability  and  performance.  Now,  they  can  evaluate 
these  systems  directly  by  compiling  applications  for  them, 
scheduling  them  for  performance,  and  simulating  them  for 
reliability.  We  did  this  for  a  few  tile  based  designs  and  found 
that  a  balanced  design  of  2  traps  to  1  interconnect  wire  laid 
out  in  a  substrate  performed  best.  We  also  found  that  the  crit¬ 
ical  threshold  is  in  fact  five  orders  of  magnitude  lower  than 
previously  found  by  theoretical  models  alone.  In  addition, 
we  determined  that  much  of  the  difference  between  the  theo¬ 
retical,  and  practical  breaking  point  can  be  attributed  to  prob¬ 
lems  that  computer  architects  are  particularly  well  suited  to 
solve. 
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Abstract 

The  assumption  of  maximum  parallelism  support  for  the 
successful  realization  of  scalable  quantum  computers  has 
led  to  homogeneous,  u  sea-of-qubits ”  architectures.  The  re¬ 
sulting  architectures  overcome  the  primary  challenges  of  re¬ 
liability  and  scalability  at  the  cost  of  physically  unaccept¬ 
able  system  area.  We  find  that  by  exploiting  the  natural  se¬ 
rialization  at  both  the  application  and  the  physical  microar¬ 
chitecture  level  of  a  quantum  computer,  we  can  reduce  the 
area  requirement  while  improving  performance.  In  particu¬ 
lar  we  present  a  scalable  quantum  architecture  design  that 
employs  specialization  of  the  system  into  memory  and  com¬ 
putational  regions,  each  individually  optimized  to  match 
hardware  support  to  the  available  parallelism.  Through 
careful  application  and  system  analysis,  we  find  that  our 
new  architecture  can  yield  up  to  a  factor  of  thirteen  savings 
in  area  due  to  specialization.  In  addition,  by  providing  a 
memory  hierarchy  design  for  quantum  computers,  we  can 
increase  time  performance  by  a  factor  of  eight.  This  result 
brings  us  closer  to  the  realization  of  a  quantum  processor 
that  can  solve  meaningful  problems. 

1  Introduction 

Conventional  architectural  design  adheres  to  the  concept 
of  balance.  For  example,  the  register  file  depth  is  matched 
to  the  number  of  functional  units,  the  memory  bandwidth  to 
the  cache  miss  rate,  or  the  interconnect  bandwidth  matched 
to  the  compute  power  of  each  element  of  a  multiprocessor. 
We  apply  this  concept  to  the  design  of  a  quantum  com¬ 
puter  and  introduce  the  Compressed  Quantum  Logic  Array 
(CQLA),  an  architecture  that  balances  components  and  re¬ 
sources  in  terms  of  exploitable  parallelism.  The  primary 
goal  of  our  design  is  to  address  the  problem  of  large  area, 
approximately  1  m2  on  a  side,  of  our  previous  design  [1]. 


Specifically,  we  discover  that  the  prevailing  approach  to 
designing  a  quantum  computer,  that  of  supporting  maxi¬ 
mal  parallelism,  is  area  inefficient.  We  also  find  that  ex¬ 
ploitable  parallelism  is  inherently  limited  by  both  resource 
constraints  and  application  structure.  This  lack  of  paral¬ 
lelism  gives  us  the  freedom  to  increase  density  by  special¬ 
izing  components  as  blocks  of  memory  and  blocks  of  com¬ 
putation. 

We  introduce  the  idea  of  periodically  reducing  our  in¬ 
vestment  in  reliability  and  thereby  increasing  speed.  By 
encoding  the  compute  regions  differently  than  memory  we 
provide  very  fast  compute  regions,  while  allowing  the  mem¬ 
ory  to  be  slower  and  more  reliable.  To  ensure  that  the  faster 
compute  region  does  not  suffer  from  too  many  stalls,  we 
employ  a  quantum  memory  hierarchy  wherein  the  cache  uti¬ 
lizes  the  same  encoding  mechanism  as  the  compute  region. 
When  making  this  effort  to  improve  speed,  it  is  critical  that 
overall  system  fidelity  is  maintained.  We  show  how  this  can 
be  accomplished. 

Due  to  the  quantum  no-cloning  theorem  [2],  it  is  neces¬ 
sary  for  all  quantum  data  to  physically  move  from  source  to 
destination.  We  cannot  create  a  copy  of  the  data  and  send 
the  copy.  Our  architecture  focuses  on  implementation  with 
an  array  of  trapped  atomic  ions,  one  of  the  most  mature  and 
scalable  technologies  that  provides  a  wealth  of  experimen¬ 
tal  data.  In  ion-traps,  the  physical  representation  of  data  are 
ions  that  are  in  constant  motion,  on  a  two  dimensional  grid, 
throughout  the  computation.  Since  this  physical  movement 
is  slow,  yet  unavoidable,  it  limits  available  parallelism  at  the 
microarchitecture  level. 

At  the  application  level,  we  find  that  only  a  limited 
amount  of  parallelism  can  be  extracted  from  key  quantum 
algorithms.  This  means  that  we  may  only  need  a  few  com¬ 
pute  blocks  for  all  the  qubits  in  memory.  This  is  in  contrast 
to  the  popular  “sea  of  qubits”  model  which  allows  compu- 
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tation  at  every  qubit.  Our  results  show  up  to  a  13X  increase 
in  density,  particulary  important  in  addressing  our  primary 
goal,  and  a  speedup  of  about  8.  The  large  area  improvement 
brings  the  engineering  of  a  quantum  architecture  closer  to 
the  capabilities  of  current  implementation  technologies. 

The  choice  of  quantum  error  correction  codes  (ECC)  in¬ 
fluences  our  results  and  the  architecture.  In  our  special¬ 
ized  architecture  analysis,  we  use  the  previously  consid¬ 
ered  Steane  [[7, 1,3]]  code  [3]  and  utilize  a  newly  optimized 
Bacon-Shor  [[9, 1,3]]  code  [4,  5].  The  [[9, 1,3]]  code,  though 
larger  than  the  [[7,1,3]]  code  since  it  uses  more  physical 
qubits  to  encode  a  single  logical  qubit,  requires  far  fewer 
resources  for  error-correction  [6],  thus  reducing  the  overall 
area  and  increasing  the  speed. 

Furthermore,  we  find  that  communication  is  generally 
dominated  by  computation  for  error  correction.  This  com¬ 
putation  allows  us  to  absorb  the  cost  of  moving  data  be¬ 
tween  different  regions  of  the  architecture.  Error  correction 
is  so  substantial,  in  fact,  that  quantum  computers  do  not  suf¬ 
fer  from  the  memory  wall  faced  by  conventional  computers. 
Thus  our  dense  structure  with  a  communication  infrastruc¬ 
ture  based  on  our  prior  work  [1]  can  accommodate  applica¬ 
tions  with  highly-demanding  communication  patterns. 

In  summary,  the  contributions  of  this  work  are:  1)  Our 
specialized  architecture,  the  CQLA,  successfully  tackles  the 
issue  of  size,  which  has  been  the  biggest  drawback  facing 
large-scale  realizable  quantum  computers.  2)  We  show  that 
current  parallelism  in  quantum  algorithms  is  inherently  lim¬ 
ited  and  consideration  of  physical  resources  and  data  move¬ 
ment  restrict  it  even  further.  3)  We  present  and  analyze  the 
abstractions  of  memory,  cache  and  computation  units  for  a 
quantum  computer;  based  on  the  insight  that  we  can  reduce 
reliability  for  the  compute  units  and  cache  without  sacri¬ 
ficing  overall  computation  fidelity.  This  approach  helps  us 
significantly  increase  the  performance  of  the  system. 

The  paper  is  organized  as  follows.  Section  2  provides 
a  background  of  the  homogeneous  QLA  architecture  and 
the  low-level  microarchitecture  assumptions  of  our  system. 
Section  3  motivates  the  specialized  CQLA  architecture  and 
introduces  the  architectural  abstractions.  Thereafter  we  dis¬ 
cuss  how  the  Steane  and  Bacon-Shor  error  correction  codes 
affect  the  design  of  the  CQLA.  Results  and  analysis  of  our 
abstractions  are  the  focus  of  section  5  following  which  we 
provide  details  of  computation  versus  communication  re¬ 
quirements  of  the  most  widely  accepted  quantum  applica¬ 
tions.  We  end  with  future  directions  in  Section  7  and  our 
conclusions  in  Section  8. 

2  Background 

Our  architectural  model  is  built  upon  our  previous  work 
on  the  Quantum  Logic  Array  (QLA)  architecture  [1].  The 


QLA  architecture  is  a  hierarchical  array-based  design  that 
overcomes  the  primary  challenges  of  scalability  for  large- 
scale  quantum  architectures.  It  is  a  homogeneous,  tiled  ar¬ 
chitecture  with  three  main  components:  logical  qubits  im¬ 
plemented  as  self-contained  computational  tiles  structured 
for  quantum  error  error  correction;  trapped  atomic  ions 
as  the  underlying  technology;  finally,  teleportation-based 
communication  channels  utilizing  the  concept  of  quantum 
repeaters  to  overcome  the  long-distance  communication 
constraints. 

2.1  The  Logical  Qubit 

The  basic  structure  of  the  QLA,  our  prior  work,  imple¬ 
ments  a  fault-tolerant  quantum  bit,  or  a  logical  qubit  as 
a  self-contained  tile  whose  underlying  construction  is  in¬ 
tended  for  quantum  error  correction,  by  far  the  most  domi¬ 
nant  and  basic  operation  in  a  quantum  machine  [7].  Quan¬ 
tum  error  correction  is  expensive  because  arbitrary  relia¬ 
bility  is  achieved  by  recursively  encoding  physical  qubits 
at  cost  of  exponential  overhead.  Recursive  error  correc¬ 
tion  works  by  encoding  N  physical  ion-qubits  into  a  known 
highly-correlated  state  that  can  be  used  to  represent  a  single 
logical  data  qubit.  This  data  qubit  is  now  at  level  1  recur¬ 
sion  and  may  have  the  property  of  being  in  a  superposition 
of  “0”  and  “1”  much  like  a  single  physical  qubit.  Encoding 
once  more  we  can  create  a  logical  qubit  at  level  2  recursion 
with  N 2  physical  ion-qubits.  With  each  level,  L,  of  encoding 
the  probability  of  failure  of  the  system  scales  as  p^L,  where 
po  is  the  failure  rate  of  the  individual  physical  components 
given  a  fault-tolerant  arrangement  and  sequence  of  opera¬ 
tions  for  the  lower  level  components.  The  ability  to  apply 
logical  operations  on  a  logical  qubit  without  the  need  to  de¬ 
code  and  subsequently  re-encode  the  data  is  key  to  the  ex¬ 
istence  of  fault-tolerant  quantum  microarchitecture  design, 
where  arbitrary  reliability  can  be  efficiently  reached  through 
recursive  encoding. 

The  logical  qubits  in  the  QLA  are  arranged  in  a  regular 
array  fashion,  connected  with  a  tightly  integrated  repeater- 
based  [8]  interconnect.  This  makes  the  high-level  design  of 
the  QLA  very  similar  to  classical  tile  based  architectures. 
The  key  difference  is  that  the  communication  paths  must 
account  for  data  errors  in  addition  to  latency.  Integrated  re¬ 
peaters  known  as  teleportation  islands  redirect  qubit  traffic 
in  the  4  cardinal  directions  by  teleporting  data  from  one  is¬ 
land  to  the  next.  This  interconnect  design  is  one  of  the  key 
innovative  features  of  QLA  architecture,  as  it  allows  us  to 
completely  overlap  communication  and  computation,  thus 
eliminating  communication  latency  at  the  application  level 
of  the  program. 

Anticipating  technology  improvements  in  the  near  future 
we  found  that  for  performing  large,  relevant  instances  of 
Shor’s  factoring  algorithm,  sufficient  reliability  is  achieved 
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Figure  1.  (a)  A  simple  schematic  of  the  basic  elements  of  a  planar  ion-trap  for  quantum  computing.  Ions  are  trapped 
in  any  of  the  trapping  regions  shown  and  ballistically  shuttled  from  one  trapping  region  to  another.  When  two  ions  are 
together  a  two-qubit  gate  can  be  performed,  (b)  Our  abstraction  of  the  ion-trap  layout.  Each  trapping  region  can  hold  up 
to  two  ions  for  two-qubit  gates.  The  trapping  regions  are  interconnected  with  the  crossing  junctions  which  are  treated  as 
a  shared  resource. 


at  level  2  encoding  per  logical  qubit  using  the  Steane 
[[7,1,3]]  error  correction  code  [9].  In  the  QLA,  computa¬ 
tion  could  occur  at  any  logical  qubit  and  each  logical  gate  is 
followed  by  an  error  correction  procedure.  To  preserve  ho¬ 
mogeneity  and  maximum  flexibility  for  large-scale  applica¬ 
tions  each  logical  qubit  was  accompanied  by  all  necessary 
error  correction  auxiliary  qubit  resources  such  that  compu¬ 
tational  speed  was  maximized.  This  amounted  to  a  (1  :  2) 
ratio  between  logical  data  qubits  and  ancillary  qubits. 

2.2  Low-Level  Physical  Architecture 
Model 

At  the  lowest  level  our  architecture  design  is  based  on  the 
ion-trap  technology  for  quantum  computation.  Initially  pro¬ 
posed  by  Cirac  and  Zoller  in  1995  [10],  the  technology  uses 
a  number  of  atomic  ions  that  interact  with  lasers  to  quantum 
compute.  Quantum  data  is  stored  in  the  internal  electronic 
and  nuclear  states  of  the  ions,  while  the  traps  themselves 
are  segmented  metal  traps  (or  electrodes)  that  allow  indi¬ 
vidual  ion  addressing.  Two  ions  in  neighboring  traps  can 
couple  to  each  other  forming  a  linear  chain  of  ions  whose 
vibrational  modes  provide  qubit-qubit  interaction  used  for 
multi-qubit  quantum  gates  [11,  12].  Together  with  single  bit 
rotations  this  yields  a  universal  set  of  quantum  logic  gates. 
All  quantum  logic  is  implemented  by  applying  lasers  on 
the  target  ions,  including  measurement  of  the  quantum  state 
[13,  14,  15,  16].  Sympathetic  cooling  ions  absorb  vibrations 
from  data  ions,  which  are  then  dampened  through  laser  ma¬ 
nipulation  [17,  18].  Recent  experiments  [19,  20,  21]  have 
demonstrated  all  the  necessary  components  needed  to  build 
a  large-scale  ion-trap  quantum  information  processor.  Fi¬ 
nally,  multiple  ions  in  different  traps  can  be  controlled  by 
focusing  lasers  through  MEMS  mirror  arrays  [22]. 

Figure  1  shows  a  schematic  of  the  physical  structure  of 
an  ion  trap  computer  element.  In  Figure  1(a)  we  see  a  single 


Operation 

Time  jus  now(future) 

Failure  Rate  now  (future) 

Single  Gate 

on 

10“4 (10“8) 

Double  Gate 

10(10) 

0.03  (10“7) 

Measure 

200  (10) 

0.01  (10“8) 

Movement 

20(10) 

0.005  (5  x  10-8)//rm 

Split 

200  (0.1) 

Cooling 

200  (0.1) 

Memory  time 

10  to  100  sec 

Trap  Size 

~  200  (1  —  5)  jjm 

Table  1.  Column  1  gives  estimates  for  execution 
times  for  basic  physical  operations  used  in  the  QLA 
model.  Currently  achieved  component  failure  rates 
are  based  on  experimental  measurements  at  NIST 
with  9Be+  ions,  and  using  24Mg+  ions  for  sympathetic 
cooling  [14,  12].  All  parameters  are  followed  by  their 
projected  parameters  in  parenthesis,  extrapolated  fol¬ 
lowing  recent  literature  [23,  24,  25],  and  discussions 
with  the  NIST  researchers;  these  estimates  are  used 
in  modeling  the  performance  of  our  architecture. 

ion  trapped  in  the  middle  trapping  region.  Trapping  regions 
are  the  locations  where  ions  can  be  prepared  for  the  execu¬ 
tion  of  a  logical  gate,  which  is  implemented  by  an  external 
laser  source  pulsed  on  the  ions  in  the  trap.  In  the  figure  we 
see  an  ion  moving  from  the  far  right  trapping  region  to  the 
top-right  for  the  execution  of  a  two-bit  logical  operation. 

Figure  1(b)  demonstrates  our  abstraction  of  the  physical 
ion-trap  layout.  The  layout  can  be  represented  as  a  collec¬ 
tion  of  trapping  regions  connected  together  through  shared 
junctions.  A  fundamental  time- step,  or  a  clock  cycle,  in 
an  ion-trap  computer  will  be  defined  as  any  physical,  un¬ 
encoded  logic  operation  (one-bit  or  two-bit),  a  basic  move 
operation  from  one  trapping  region  to  another,  and  measure¬ 
ment.  Table  1  summarizes  current  experimental  parame¬ 
ters  and  corresponding  optimistic  parameters  for  ion-traps. 


64 


Figure  2.  For  a  64-qubit  adder,  the  amount  of  paral¬ 
lelism  that  can  be  extracted  when  resources  are  un¬ 
limited,  and  when  the  number  of  gates  per  cycle  are 
limited.  This  figure  shows  that  if  15  gates,  or  an  un¬ 
limited  number  of  gates  could  be  performed  in  each 
cycle,  the  total  runtime  would  remain  the  same,  com¬ 
pute  blocks  increases. 


In  our  subsequent  analysis  we  will  assume  that  each  clock 
cycle  for  a  fundamental  time-step  has  a  duration  of  10  jus, 
failure  rates  are  10-8  for  single-qubit  operations  and  mea¬ 
surement,  10-7  for  CNOT  gates  [25],  and  order  of  10-6  per 
fundamental  move  operation.  The  movement  failure  rate 
is  expected  to  improve  from  what  it  is  now  as  trap  sizes 
shrink  and  electrode  surface  integrity  continues  to  improve. 
We  will  assume  trap  sizes  of  5jum  each  [26],  and  on  the  or¬ 
der  of  10  electrodes  per  trapping  region  [27],  which  gives 
us  a  trapping  region  dimension  (including  the  junction)  of 
50/rm.  The  parameters  chosen  for  our  study  are  optimistic 
compared  to  [28]  and  [29].  Both  of  those  papers,  assume 
more  pessimistic  near  term  parameters  which  are  useful  for 
building  a  100  bit  prototype,  but  probably  not  a  scalable 
quantum  computer  that  can  factor  1024-bit  numbers  using 
Shor’s  algorithm.  Based  on  the  quantum  computing  ARDA 
roadmap  [23],  we  feel  justified  in  using  aggressive  parame¬ 
ters  when  looking  10-15  years  into  the  future. 

3  Architectural  Abstractions 

This  section  motivates  the  need  for  a  compact  architec¬ 
ture  for  quantum  processors  and  describes  our  design  the 
CQLA  (Compressed  Quantum  Logic  Array).  We  discuss 
how  separation  into  memory  and  compute  regions  benefits 
the  CQLA  and  then  present  our  quantum  memory  hierarchy. 

3.1  Motivation 

Conventional  quantum  processor  designs  are  based  on 
the  sea-of-qubits  design  and  allow  computation  to  take 


place  anywhere  in  the  processor.  This  design  philosophy 
follows  the  idea  of  maximum  parallelism  and  is  employed 
in  our  previous  work  [1].  The  area  consumption  of  such  a 
design  however,  is  untenably  large,  about  1  m2  to  factor  a 
1024-bit  number. 

When  we  consider  the  amount  of  available  parallelism  in 
quantum  applications,  we  discover  that  much  is  to  be  gained 
by  limiting  computation  to  a  specifically  designated  loca¬ 
tion.  The  remaining  area  can  be  optimized  for  storage  of 
quantum  data.  A  good  example  for  the  benefit  of  specializa¬ 
tion  in  quantum  applications  is  the  Draper  carry-lookahead 
quantum  adder  [30],  which  forms  a  basic  basic  component 
of  Shor’s  quantum  factoring  algorithm  [31].  Figure  2  shows 
that  providing  unlimited  computational  resources  for  a  64- 
bit  adder  does  not  offer  a  performance  benefit  over  limiting 
the  computation  to  15  locations.  As  illustrated  in  Section 
2,  the  number  of  ancillary  resources  for  each  data  location 
where  computation  is  allowed  is  twice  as  large.  In  this  ex¬ 
ample,  by  providing  only  15  compute  locations  instead  of 
64,  we  can  reduce  the  area  consumed  by  the  adder  by  ap¬ 
proximately  half  and  yet  have  no  change  in  performance. 

3.2  Specialized  Components 

The  facts  that  qubits  in  an  ion-trap  quantum  processor 
have  large  lifetimes  when  idle,  allows  us  to  improve  logical 
qubit  density  in  the  memory.  Qubits  in  memory  can  wait  for 
a  longer  time  period  between  two  consecutive  error  correc¬ 
tions.  We  use  this  to  significantly  reduce  the  error  correction 
ancillary  resources  in  memory,  thereby  reducing  its  density. 
The  majority  of  computation,  on  the  other  hand,  is  an  in¬ 
teraction  between  two  distinct  logical  qubits.  To  maintain 
adequate  system  fidelity,  every  gate  must  be  followed  by  an 
error  correction  procedure.  Consequently,  a  quantum  pro¬ 
cessor  spends  most  of  its  time  performing  error  correction 
and  the  compute  regions  are  designed  to  allow  fast  error 
correction  by  providing  a  greater  number  of  ancilla  in  the 
logical  qubits.  Figure  3(a)  shows  a  specialization  into  com¬ 
pute  and  memory  regions.  The  ratio  of  (data: ancilla)  can  be 
seen  to  be  (8  :  1)  for  memory  and  (1:2)  for  the  compute 
region. 

While  specialization  helps  address  our  primary  goal  of 
reducing  size,  it  can  possibly  also  reduce  performance.  In 
Section  5  we  show  how  judiciously  choosing  the  size  of  the 
compute  region  helps  maintain  adequate  performance  while 
simultaneously  reducing  size  . 

3.3  Quantum  Memory  Hierarchy 

Another  important  architectural  design  choice  is  the  ef¬ 
fect  of  the  error  correction  code  chosen  in  both  the  mem¬ 
ory  and  the  compute  regions.  Error  correction  is  the  most 
dominant  procedure  and  the  resources  used  increase  expo¬ 
nentially  with  each  level  of  concatenation.  In  addition  to 
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Figure  3.  (a)  Memory  is  denser  since  it  has  fewer  ancilla  qubits.  The  figure  shows  3  data  qubits  in  the  compute  block 
which  take  the  same  area  as  8  data  qubits  in  memory.  In  the  CQLA  each  compute  block  holds  nine  9  data  qubits  and  18 
ancilla.  Both  compute  and  memory  are  at  level  2  encoding,  (b)  Memory  is  at  level  2  encoding,  while  the  compute  and 
cache  are  at  level  1  encoding.  The  complete  CQLA  consists  of  memory  at  level  2,  compute  regions  at  level  2  and  also 
a  cache  and  compute  region  at  level  1. 


resources,  the  time  to  error  correct  increases  exponentially 
with  each  level  of  concatenation.  The  benefits  of  concate¬ 
nated  error  correction  are  that  the  reliability  of  each  op¬ 
eration  increases  double  exponentially,  thus  allowing  far 
greater  number  of  total  operations  to  be  performed.  For  any 
application,  all  logical  qubits  are  not  being  acted  upon  by 
gates  for  the  entire  duration  of  the  algorithm.  In  fact,  just 
like  classical  computers,  data  locality  is  a  common  phe¬ 
nomenon.  This  implies  that  a  logical  qubit  could  start  at 
level  2  encoding,  be  encoded  at  level  1  during  the  peak  in 
its  activity  and  return  to  level  2  when  idle. 

We  now  introduce  a  quantum  memory  hierarchy,  in  addi¬ 
tion  to  the  specialized  design.  Memory  at  level  2,  which  is 
optimized  for  area  and  reliability  will  be  inherently  slower 
than  a  computational  structure,  at  level  1,  optimized  for  gate 
execution.  This  necessitates  the  need  for  a  cache  that  can  al¬ 
leviate  the  need  for  constant  communication. 

Figure  3(b)  outlines  this  approach,  the  separation  be¬ 
tween  memory  and  compute  regions.  The  cache  and  the 
compute  regions  here  are  similar  to  Figure  3(a)  in  every  way 
save  that  they  are  at  a  lower  level  of  encoding.  In  the  mem¬ 
ory  hierarchy,  memory  and  cache  have  a  similar  design, 
only  memory  is  at  a  higher  level  of  encoding,  and  hence  is 
slower  and  much  more  reliable.  The  critical  feature  here  is 
the  transfer  network  which  is  more  complicated  and  hence 
slower  than  the  teleporation  channels  described  above.  The 
transfer  network  comes  into  play  only  when  we  change  the 
encoding  of  a  logical  qubit.  For  all  other  communication 
(within  compute  blocks,  between  cache  and  compute  blocks 
and  within  memory)  teleportation  is  still  the  chosen  mech¬ 
anism.  Section  4  describes  how  the  transfer  process  is  per¬ 
formed  in  a  fault-tolerant  manner. 


4  Error  Correction  and  Code  Transfer 

In  this  section  we  describe  the  cost  of  the  error  correc¬ 
tion  circuits  and  code-transfer  networks  we  use  when  a  spe¬ 
cific  physical  layout  is  considered.  Section  2.2  describes  in 
detail  our  technology  parameters,  which  we  find  to  be  nec¬ 
essary  for  such  a  large-scale  architecture.  These  parameters 
allow  the  large  scalability  to  be  achieved  because  the  phys¬ 
ical  component  failure  rates  are  below  the  threshold  value 
needed  for  efficient  error  correction  [32]. 

4.1  Error  Correction  Codes 

Some  of  the  best  error  correction  codes  (ECC)  are  ones 
that  use  very  few  physical  qubits,  and  allow  “easy”  fault- 
tolerant  gate  implementations.  A  requirement  of  a  fault- 
tolerant  system  is  that  computation  proceeds  without  decod¬ 
ing  the  encoded  data.  Thus  logical  gates  are  implemented 
directly  on  encoded  qubits,  ensuring  that  errors  introduced 
during  the  gate  can  be  corrected.  Many  code  choices  for 
EC  allow  transversal  logical  gate  implementation,  which 
means  that  the  same  physical  gate  acts  on  each  lower-level 
qubit. 

Each  logical  quantum  gate  is  preceded  and  followed  by 
an  error  correction  procedure.  The  EC  procedure  works 
by  encoding  ancillary  qubits  in  the  logical  “0”  state  of  the 
data  and  interacting  the  data  and  the  ancilla.  The  interaction 
causes  errors  in  the  data  to  propagate  to  the  ancilla  and  to 
be  detected  when  the  ancilla  is  measured.  There  are  several 
very  important  logical  gates  that  we  must  consider  during 
error  correction.  The  bit-flip  gate,  X  flips  the  value  of  the 
qubit  by  reversing  the  probabilities  between  its  “0”  com¬ 
ponent  and  its  “1”  component.  The  phase-flip  gate,  Z,  acts 
only  on  the  qubit’s  “1”  component  by  changing  its  sign.  The 
most  important  gate  is  the  controlled-X  gate  (denoted  as  the 
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CNOT  gate)  which  flips  the  state  of  the  target  qubit  whenever 
the  state  of  the  control  qubit  is  set.  Errors  on  the  data  can 
be  understood  as  the  product  of  phase-flips  and  a  bit-flips. 
A  syndrome  is  extracted  for  each  types  of  error.  We  only 
present  the  cost  of  error  correction  networks  and  details  rel¬ 
evant  to  building  a  large-scale  architecture.  The  interested 
reader  can  refer  to  the  literature  for  additional  theoretical 
information  [33]. 


bit-flip  syndrome  phase-flip  syndrome 
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Figure  4.  A  high-level  view  of  an  error  correction  se¬ 
quence.  Two  syndromes  for  bit-flip  and  phase-flip  er¬ 
rors  are  extracted. 


Figure  4  is  a  simple  schematic  of  the  general  error  cor¬ 
rection  procedure,  where  time  flows  from  left  to  right  and 
each  line  represents  the  evolution  of  an  encoded  logical 
qubit.  An  error  correction  code  is  labeled  by  [[«,&,  d]  ,  en¬ 
coding  k  logical  qubits  into  n  qubits  and  correcting  (d  — 
l)/2  errors.  If  our  target  reliability  is  such  that  we  require 
L  levels  of  recursion,  each  line  in  Figure  4  represents  nL 
level  zero  qubits.  For  the  bit-flip  error  syndrome  the  an- 
cilla  are  encoded  into  the  logical  (0+1),  and  the  transversal 
CNOT  gate,  which  is  essentially  n  level  (L—  1)  transversal 
CNOT  gates  of  which  the  ancillary  qubits  are  targets.  Each 
of  the  lower  level  CNOT  gates  is  followed  by  a  lower  level 
error  correction  unless  the  lower  level  is  zero.  In  our  ar¬ 
chitecture  analysis  we  provide  information  about  two  error 
correcting  codes:  the  Steane  [[7,1,3]]  code  [9],  and  an  im¬ 
proved  version  of  the  Shor  [[9,1, 3]]  code  [34]  denoted  as  the 
Bacon-Shor  code  [4,  5,  6]. 

The  Steane  [[7,1,3]]  Code  encodes  1  qubit  into  7  qubits, 
and  is  the  smallest  error  correction  code  allowing  transver¬ 
sal  gate  implementation  for  all  gates  involved  in  concate¬ 
nated  error  correction  algorithms.  The  addition  of  the  T 
phase  gate,  which  is  harder  to  implement,  provides  univer¬ 
sal  quantum  logic  using  the  [[7, 1,3]]  error  correcting  code. 
For  this  reason  it  was  used  as  the  underlying  error  correcting 
code  in  the  analysis  of  the  QLA  architecture  [1].  It  consists 
of  7  data  ions  which  encode  our  logical  level  1  qubit  with  14 
ancillary  ions  used  for  error  correction,  seven  of  which  are 
used  in  the  error  correction  and  the  other  verify  the  ancilla. 

Considering  communication,  the  level  1  error  correction 
circuit  in  will  take  154  cycles,  where  each  cycle  is  in  the 
order  of  10  microseconds,  and  can  be  as  large  as  0.003  per 
error  correction  procedure  at  level  1 .  A  level  2  [[7, 1 , 3]]  qubit 
will  be  composed  of  7  level  1  data  qubits  and  7  level  1  an¬ 
cilla  qubits  -  there  is  no  need  for  verification  ancilla  at  L  —  2. 


Error  Correction  Metric  Summary 

Architecture  Metric 

Error  Code  -  Level 

Value 

EC  Time  (seconds) 

[7,1,3]]  -  LI 

3.1  xl(X 

[7,1,3]]  -  L2 

0.3 

[9, 1,3]  -LI 

1.2  xlO-3 

[9, 1,3]  -  L2 

0.1 

Qubit  Size 

[7, 1,3]  -LI 

0.2 

(mm2) 

[7,1,3]]  -  L2 

3.4 

[9, 1,3] -LI 

0.1 

[9, 1,3]  -  L2 

2.4 

Transversal  Gate 

[7, 1,3] -LI 

6.2  xlO-3 

Time  (seconds) 

[7,1,3]  -  L2 

0.5 

[9, 1,3] -LI 

2.4  xlO-3 

[9,1,3]  -L2 

0.2 

Size,  number  of 

[7, 1,3] -LI 

7 

logical  qubits 

[7,1,3]-  LI  (ancilla) 

21 

[7,1,3]  -  L2 

49 

[7, 1,3]  -  L2(ancilla) 

441 

[9, 1,3] -LI 

9 

[9,1,3]-  LI  (ancilla) 

12 

[9,1,3] -L2 

81 

[9, 1,3]  -  L2(ancilla) 

298 

Table  2.  Error  Correction  Metric  Summary.  Given 
the  fact  that  we  use  optimistic  ion-trap  parameters  all 
numbers  are  estimates  and  are  thus  rounded  to  only 
one  significant  digit. 


The  size  of  a  level  2  qubit  will  be  3.4  mm2,  and  a  fully  seri¬ 
alized  error  correction  will  last  approximately  0.3  seconds 
(this  is  two  orders  of  magnitude  more  than  the  time  to  error 
correct  at  level  1). 

Bacon-Shor  [[9, 1,3]]  Code:  The  [[9, 1,3]]  code  was  the  first 
error  correcting  code  to  be  discovered  for  arbitrary  errors 
[34].  Recent  observations  make  this  code  faster  and  spa¬ 
tially  smaller  than  the  [[7, 1,3]]  code  [4,  5,  6].  The  compact 
structure  of  the  physical  layout  for  the  [[9,1,3]]  code  sig¬ 
nificantly  improves  communication  requirements.  At  level 
1  the  error  correction  time  is  only  0.001  seconds  and  0.1 
seconds  at  level  2.  The  level  2  qubit  size  is  approximately 
2.4  mm2.  Table  2  summarizes  the  error  correction  we  have 
used  and  their  parameters  for  some  useful  architecture  met¬ 
rics. 

4.2  Code  Transfer  Networks:  Overview 

One  of  the  most  interesting  components  of  the  memory 
hierarchy  are  the  code  transfer  regions.  This  region  trans¬ 
fers  data  encoded  in  code  Cl  to  a  second  code  C2  with¬ 
out  the  need  to  decode.  Figure  4.2  illustrates  this  concept. 
The  transfer  network  teleports  the  data  in  Cl  to  C2,  where 
Cl  and  C2  may  be  any  two  error  correcting  codes.  The 
code  teleportation  procedure  works  much  the  same  way 
as  standard  data  teleportation  that  is  used  for  communica¬ 
tion.  A  correlated  ancillary  pair  is  prepared  first  between 
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(seconds) 

7-L1 

7-L2 

9-L1 

9-L2 

7-L1 

0 

0.6 

0.02 

0.2 

7-L2 

1.3 

0 

1.3 

1.5 

9-L1 

0.01 

0.5 

0 

0.1 

9-L2 

0.4 

0.9 

0.4 

0 

Table  3.  Transfer  network  latency  for  a  combination 
of  the  [[7, 1,3]]  and  [[9, 1,3]]  codes. 


Cl  and  C2  through  the  use  of  a  multi-qubit  cat- state  (i.e. 
“(00...0  +  11...  1)”)-  The  data  qubit  interacts  with  the  equiv¬ 
alently  encoded  ancillary  qubit  through  a  CNOT  gate,  and 
the  two  are  measured.  Following  the  measurement  the  state 
of  the  data  is  recreated  at  the  C2  encoded  ancillary  qubit. 
This  process  is  required  every  time  we  transfer  a  qubit  from 
memory  to  the  cache  or  vise- versa.  Table  3  summarizes  the 
times  for  different  code  transfer  combinations  between  lev¬ 
els  1  and  2  for  the  [[7, 1,3]]  and  the  [[9, 1,3]]  codes. 


(4CatQubits  +  6)  cycles 


Figure  5.  Code  Teleportation  Network  from  Code  1 
(Cl )  to  Code  2  (C2)  Cl  and  C2  can  even  be  the  same 
error  correcting  code,  but  different  levels  of  encoding. 
The  solid  triangles  denote  an  error  correction  step. 

5  CQLA  Analysis  and  Results 

This  section  provides  analysis  of  the  abstractions  pre¬ 
sented  in 

to  perform  quantum  modular  exponentiation. 

5.1  Specialization  into  Memory 

We  now  analyze  our  design,  the  CQLA,  when  it  sep¬ 
arates  the  quantum  processor  into  memory  and  compute 
regions.  High  density  in  memory  is  achieved  by  greatly 
reducing  the  ratio  of  logical  data  qubits  to  logical  ancilla 
qubits,  which  is  (8  :  1)  in  memory  and  is  (1  :  2)  in  the  com¬ 
pute  regions.  This  greatly  reduces  overall  area  since  prior 
work  had  a  ratio  of  (1  :  2)  throughout  the  architecture.  Thus 
the  memory  is  denser,  but  slower,  which  is  permissible  due 
to  the  large  memory  wait  times  1 . 


Quantum  modular  exponentiation  is  the  most  time  con¬ 
suming  part  of  Shor’s  algorithm,  and  the  Draper  carry- 
lookahead  adder  is  its  most  efficient  implementation.  This 
adder  comprises  single  qubit  gates,  two  qubit  cnot  gates  and 
three  qubit  toffoli  gates  and  is  dominated  by  toffoli  gates. 
The  time  to  perform  a  single  fault-tolerant  toffoli  is  equal 
to  the  time  for  fifteen  two  qubit  gates,  each  of  which  is  fol¬ 
lowed  by  an  error-correction  step.  Table  5.1  shows  the  sav¬ 
ings  that  can  be  achieved  when  using  denser  memory.  Note 
that  performance  is  minimally  impacted  for  the  Steane  Code 
as  we  exploit  the  limited  parallelism  in  the  adder.  We  ad¬ 
dress  the  parallelism  available  within  the  application  itself 
and  determine  the  number  of  compute  blocks  to  maximally 
exploit  this  parallelism  with  change  with  problem  size.  Fig¬ 
ure  6(a)  shows  how  for  a  fixed  problem  size,  utilization  of 
each  compute  block  decreases  with  an  increase  the  num¬ 
ber  of  compute  blocks.  Clearly,  the  decrease  in  utilization 
is  offset  by  the  increase  in  overall  performance.  Thus  the 
challenge  here  is  to  find  the  balance  between  utilization  and 
performance. 

We  compare  all  our  results  to  [1],  which  used  only 
the  Steane  ECC.  Since  the  Bacon-Shor  ECC  uses  fewer 
overall  resources  2  and  allows  faster  error-correction,  a 
design  based  on  these  codes  not  only  is  much  smaller, 
but  is  also  faster.  The  CQLA,  thus  reduces  area  re¬ 
quired  by  a  factor  of  9  with  minimal  performance 
reduction  for  the  Steane  ECC  and  by  a  factor  of  13 
with  a  speedup  of  2  when  using  the  Bacon-Shor  ECC. 
To  compare  the  relative  merit  our  design  choices,  we 
use  the  gain  product  which  can  be  defined  by  GP  — 
(Area0id  *AdderTime0id)  /  ( AreacQLA  *AdderTimecQLA) 
where  AdderTime  is  the  average  time  per  adder  for 
modular  exponentiation.  The  gain  product  indicates  the 
improvement  in  system  parameters  relative  to  our  prior 
work,  the  QLA.  The  higher  the  gain  product,  the  better  the 
collective  improvement  in  area  and  time  of  our  system. 

Communication  Issues:  Toffoli  gates  cannot  be  di¬ 
rectly  implemented  on  encoded  data  and  have  to  be  broken 
down  into  multiple  two  qubit  gates.  Performing  a  fault- 
tolerant  Toffoli  between  three  logical  qubits  requires  ex¬ 
tra  logical  ancilla  and  logical  cat- state  qubits.  The  flow  of 
data  between  these  nine  qubits  to  complete  a  single  toffoli 
forms  the  most  intense  communication  pattern  during  the 
entire  addition  operation.  To  study  the  bandwidth  require¬ 
ments  during  the  toffoli  gates,  we  developed  a  scheduler 
that  would  try  to  have  all  the  requirements  for  communica¬ 
tion  (creating  EPR  pairs,  transporting  and  purifying  them) 
in  place  while  the  logical  qubit  to  be  transported  was  under¬ 
going  error-correction  after  completion  of  the  previous  gate. 
With  bandwidth  of  one  channel,  it  was  possible  to  overlap 
communication  with  computation  for  the  Steane  [[7,1,3]] 
code.  To  enable  this  overlap  when  using  the  Bacon-Shor 
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Input 

Compute 

Area  Reduced  (Factor  of) 

SpeedUp 

Gain  Product 

Size 

Blocks 

St-Code 

BSr-Code 

St-Code 

BSr-Code 

St-Code 

BSr-Code 

32-bit 

4 

6.69 

9.80 

0.54 

1.47 

3.61 

14.41 

9 

3.22 

4.74 

0.97 

2.9 

3.14 

13.74 

64-bit 

9 

6.36 

9.32 

0.70 

1.92 

4.45 

17.70 

16 

3.79 

5.56 

0.98 

3.0 

3.71 

16.68 

128-bit 

16 

7.24 

10.6 

0.72 

1.97 

5.24 

20.88 

25 

4.90 

7.17 

0.96 

2.84 

4.70 

20.36 

256-bit 

36 

6.65 

9.47 

0.92 

2.51 

6.12 

23.68 

49 

5.07 

7.43 

0.98 

2.98 

4.96 

22.14 

512-bit 

64 

7.42 

10.87 

0.92 

2.50 

6.80 

27.18 

81 

6.06 

8.87 

0.98 

2.91 

5.94 

25.81 

1024-bit 

100 

9.14 

13.4 

0.80 

2.19 

7.35 

29.35 

121 

7.81 

11.45 

0.97 

2.65 

7.60 

30.34 

Table  4.  For  various  size  inputs,  this  table  shows  how  the  CQLA  performs  for  Modular  Exponentiation.  The  space 
saved  due  to  compressing  the  memory  blocks  and  separating  memory  and  compute  regions  is  shown  as  compared  to 
prior  work  [1].  St-Code  is  the  Steane  ECC  and  BSr-Code  is  the  Bacon-Shor  code.  The  Gain  Product  is  compared  with 
our  prior  work,  the  QLA,  which  has  a  Gain  Product  of  1 .0. 


code,  the  required  bandwidth  was  three  channels.  Table  2 
shows  that  while  a  logical  qubit  encoded  in  the  Bacon-Shor 
code  is  smaller  when  ancilla  are  considered;  it  has  more 
data  qubits  than  the  Steane  code.  Since  only  data  qubits 
are  involved  during  teleportation,  the  time  for  teleporting 
a  logical  qubit  in  the  Bacon-Shor  code  is  greater.  In  addi¬ 
tion,  the  Bacon-Shor  codes  take  far  fewer  error-correction 
cycles.  These  two  factors  push  its  bandwidth  requirement 
higher.  Note  that  the  higher  bandwidth  is  accounted  for  in 
results  of  Table  5.1. 

Superblocks:  In  the  CQLA,  several  compute  blocks  to¬ 
gether  form  compute  superblocks.  This  is  done  to  exploit 
the  locality  inherent  to  an  application.  Having  larger  su¬ 
perblocks  also  increases  the  perimeter  bandwidth  between 
the  compute  and  memory  regions  of  the  CQLA.  This  in¬ 
crease  in  bandwidth  of  a  larger  superblock  is  offset  by  the 
much  greater  increase  in  communication  required.  Our  in¬ 
tuition  tells  us  that  at  a  certain  point,  it  may  be  more  effi¬ 
cient  to  have  multiple  small  superblocks  instead  of  one  large 
superblock.  To  determine  this  number  concretely,  we  plot 
the  change  in  bandwidth  required  against  change  in  band¬ 
width  available.  Figure  6(b)  shows  the  cross-over  point  is 
36  compute  blocks  per  superblock,  immaterial  of  what  error 
correction  code  is  used.  Thereafter  it  is  no  longer  beneficial 
to  increase  the  size  of  an  individual  compute  superblock. 

5.2  Memory  Hierarchy 

Reducing  the  encoding  level  of  the  compute  region  will 
dramatically  increase  its  speed.  Recall  that  resources,  time 
and  reliability  all  increase  exponentially  as  we  increase  the 
level  of  encoding.  With  the  compute  region  at  level  1  and 


memory  at  level  2,  the  challenge  is  the  very  familiar  one  of 
the  CPU  being  an  order  of  magnitude  faster  than  the  mem¬ 
ory.  To  maximize  the  benefit  of  a  much  faster  compute  re¬ 
gion,  we  introduce  the  quatum  memory  hierarchy.  In  our 
hierarchy,  the  memory  is  at  level  2  encoding  (slow  and  reli¬ 
able),  cache  is  at  level  1  (faster,  less  reliable)  and  the  com¬ 
pute  region  is  also  at  level  1  (fastest  and  same  reliability  as 
cache).  The  difference  in  speed  between  the  compute  re¬ 
gion  and  the  cache  is  the  due  to  a  greater  number  of  ancilla 
in  the  compute  region. 

To  study  the  behavior  of  the  CQLA  with  a  cache  and 
multiple  encoding  levels,  we  developed  a  simulator  that 
models  a  cache.  The  simulator  takes  into  account  the  com¬ 
putation  cost  in  both  encoding  levels  and  also  the  cost  of 
transferring  logical  qubits  between  encoding  levels.  The 
application  under  consideration  is  still  the  Draper  carry- 
lookahead  adder.  Input  to  the  simulator  is  a  sequence  of  in¬ 
structions;  each  instruction  is  similar  to  assembly  language 
and  describes  a  logical  gate  between  qubits.  We  have  writ¬ 
ten  generators  that  output  this  code  in  a  form  that  can  take 
advantage  of  an  architecture  with  maximal  parallelism. 

When  the  simulator  runs  this  code  in  the  sequence  in¬ 
tended  by  the  Draper  carry-lookahead  adder,  the  cache  hit- 
rate  is  limited  to  20%.  To  improve  the  hit-rate,  we  utilize 
the  following  optimized  approach.  Since  we  are  schedul¬ 
ing  statically,  the  instruction  fetch  window  for  the  simulator 
can  be  the  whole  program.  The  simulator  takes  advantage 
of  this  by  first  creating  a  dependency  list  of  all  input  instruc¬ 
tions.  Then  it  carefully  selects  the  next  instruction  such  that 
probability  of  finding  all  required  operands  in  the  cache  is 
maximized.  This  optimized  fetch  yields  a  cache  hit-rate  of 
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Figure  6.  (a)  Change  in  utilization  as  the  number  of  compute  blocks  increases,  (b)  The  point  of  intersection  of  the  two 
bottom  curves  is  the  optimal  size  of  a  compute  superblock.  These  two  curves  are  bandwidth  required  (at  the  perimeter 
of  the  compute  superblock)  in  modular  exponentiation  and  bandwidth  available.  The  third  steep  curve,  is  the  worst  case 
bandwidth  required. 


Figure  7.  Shows  the  cache  hitrate  for  different 
adders  when  both  cache  and  compute  region  are  at 
Level  1  recursion.  Largest  cache  considered  holds 
twice  the  number  of  logical  qubits  as  the  compute 
block.  Results  for  both  the  non-optimized  version  and 
the  optimized  version  are  shown. 


almost  85%  immaterial  of  adder  size  and  cache  size.  The  re¬ 
placement  policy  in  the  cache  is  least  recently  used.  Figure 
7  shows  the  cache  hit-rates  for  different  sized  adders  for  the 
non-optimized  and  optimized  instruction  fetch  approaches. 
If  n  is  the  number  of  logical  qubits  in  the  compute  region, 
the  cache  sizes  we  studied  were  n,  1 .5 n  and  2 n.  As  the  graph 
shows,  the  increase  in  hit-rate  is  more  pronounced  due  to  the 
optimized  fetch  than  increasing  cache  size.  For  the  CQLA, 
we  thus  employ  a  cache  size  of  twice  the  number  of  qubits 
in  the  compute  region.  The  high  hit-rate  means  the  transfer 
networks  will  not  be  overwhelmed. 

Fault- tolerance  with  multiple  encoding  levels:  A  quan¬ 
tum  computer  running  an  application  of  size  S  =  KQ,  where 
K  is  the  number  of  time- steps  and  Q  is  the  number  of  logical 
qubits,  will  need  to  have  a  component  failure  rate  of  at  most 
Pf  =  1 /KQ.  To  evaluate  the  expected  component  failure 
rate  at  some  level  or  recursion  we  use  Gottesman’s  estimate 
for  local  architectures  [35]  shown  in  Equation  1  below. 

pf  =  icr2Po)2L  =  ~r zW/>o)2i  (1) 

The  value  for  r  is  the  communication  distance  between 
level  1  blocks  which  are  aligned  in  QLA  to  allow  r  =  12 
cells  on  average  and  L  denotes  the  level  of  recursion.  The 
threshold  failure  rate,  pth,  for  the  Steane  [[7,1,3]]  circuit  ac¬ 
counting  for  movement  and  gates  was  computed  in  [36]  to 
be  approximately  7.5  x  10-5.  Taking  as  po  the  average  of 
the  expected  failure  probabilities  given  in  Table  1 ,  and  us¬ 
ing  Equation  1,  we  find  that  for  our  system  to  be  reliable 
it  can  spend  only  2%  of  the  total  execution  time  in  level 
1.  Recall  that  error-correction  is  the  most  frequently  pe- 
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Par  Xfer 

Adder  Size 

LI  SpeedUp 

L2  SpeedUp 

Adder  SpeedUp 

Area  Reduced 

Gain  Product 

Steane  [[7, 1,3]]  Code 

256 

17.417 

0.98 

6.25 

5.07 

31.68 

10 

512 

17.41 

0.97 

6.33 

6.06 

38.38 

1024 

18.18 

0.88 

4.93 

9.14 

45.06 

256 

10.409 

0.98 

4.05 

5.07 

24.99 

5 

512 

10.408 

0.97 

4.04 

6.06 

24.48 

1024 

10.96 

0.88 

2.94 

9.14 

26.87 

B acon-Shor  [[9,1,3]]  Code 

256 

9.61 

1.53 

5.92 

7.43 

43.99 

10 

512 

9.61 

2.28 

8.82 

8.87 

78.23 

1024 

10.15 

2.00 

8.10 

13.4 

108.53 

256 

5.17 

1.53 

3.66 

7.43 

27.19 

5 

512 

5.17 

2.28 

5.45 

8.87 

48.37 

1024 

5.49 

2.00 

4.99 

13.40 

66.90 

Table  5.  This  table  shows  the  results  of  incorporating  a  memory  hierarchy  and  two  separate  encoding  levels.  Depending 
on  the  number  of  parallel  transfers  possible  between  memory  and  cache,  we  can  expect  different  speedup  values  for 
the  adder  at  level  1 .  This  combined  with  results  from  Table  5.1  give  us  the  final  Gain  Product.  Comparatively,  prior  work 
has  an  Gain  Product  number  of  1 .0. 


formed  operation  in  the  CQLA.  For  the  Steane  code,  level 
2  error  correction  takes  0.3  sec  and  level  1  takes  3.1  x  10-3 
sec,  which  is  approximately  1%  of  the  level  2  time.  Thus 
if  all  operations  performed  by  the  CQLA  were  equally  di¬ 
vided  between  level  1  and  level  2  operations,  the  system  will 
maintain  its  fidelity.  The  Bacon-Shor  ECC  can  be  analyzed 
in  a  similar  manner  and  their  results  are  more  favourable 
due  to  a  higher  threshold. 

The  CQLA  architecure  now  consists  of  a  memory  at 
level  2,  a  compute  region  also  at  level  2,  a  cache  and  a  com¬ 
pute  region  at  level  1  and  transfer  networks  for  changing  the 
qubit  encoding  levels.  Since  quantum  modular  exponentia¬ 
tion  is  perfomed  by  repeated  quantum  additions,  we  could 
perform  half  of  these  additions  completely  in  level  2  and  the 
other  half  in  level  1 .  To  comfortably  maintain  the  fidelity  of 
the  system,  we  perform  one  level  1  addition  for  every  two 
level  2  additions.  The  resulting  increase  in  performance  is 
shown  in  Table  5. 

6  Application  Behavior 

In  this  secti  compute  and  memory  are  at  level  2  encod¬ 
ing.  Contrary  to  traditional  silicon  based  processors,  in  the 
CQLA  a  single  communication  step  does  not  take  longer 
than  the  computation  of  a  single  gate.  The  reason  behind 
this  phenomenon  is  the  lack  of  reliability  of  quantum  data, 
which  forces  us  to  perform  an  error-correction  procedure 
after  each  gate.  The  time  to  complete  a  fault- tolerant  Tof- 
foli  is  about  20  times  greater  than  a  two-qubit  CNOT  gate. 
The  applications  we  study  are  modular  exponentiation  and 


the  quantum  fourier  transform. 

6.1  Shor’s  Algorithm 

Shor’s  algorithm  is  the  most  celebrated  of  quantum  algo¬ 
rithms  due  to  its  potential  exponential  advantage  over  con¬ 
ventional  algorithms  and  its  application  to  breaking  public- 
key  cryptography  [31].  Shor’s  algorithm  is  primarily  com¬ 
posed  of  two  parts,  the  modular  exponentiation  and  the 
quantum  fourier  transform. 

Modular  Exponentiation:  The  execution  of  modular  ex¬ 
ponentiation  is  dominated  by  Toffoli  gates.  To  keep  the 
compute  block  from  having  to  wait  for  qubits,  and  hence 
stalling,  the  bandwidth  around  the  perimeter  of  the  com¬ 
pute  block  has  to  accommodate  the  transfer  of  three  qubits 
to  and  from  memory.  Intuitively,  since  the  CQLA  is  a  mesh, 
and  the  bottleneck  in  bandwidth  will  be  at  the  edge  of  the 
compute  blocks,  having  adequate  bandwidth  at  this  edge  is 
sufficient  for  the  rest  of  the  mesh. 

Based  on  the  communication  results  from  [1],  we  calcu¬ 
late  that  a  2  channels  on  the  perimeter  of  the  compute  block 
would  provide  adequate  bandwidth  for  all  required  commu¬ 
nication.  We  compute  the  time  required  for  all  communica¬ 
tion  steps  and  compare  it  against  the  total  computation  time 
for  differently  sized  adders.  The  result  is  shown  in  8(a)  and 
demonstrates  that  communication  requirements  do  not  ad¬ 
versely  impact  the  design. 

Quantum  Fourier  Transform:  While  the  Quantum 

Fourier  Transform  (QFT)  comprises  a  small  fraction  of  the 
overall  Shor’s  algorithm,  it  requires  all-to-all  personalized 
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Figure  8.  Total  communication  and  computation  times  for  the  two  components  of  Shor’s  algorithm,  (a)  Modular  Expo¬ 
nentiation  (b)  Quantum  Fourier  Transform  (QFT).  Although  communication  is  significant  in  the  QFT,  Modular  exponenti¬ 
ation  dominates  Shor’s  algorithm.  Both  these  results  are  for  the  Bacon-Shor  code 


communication  between  data  qubits.  In  addition,  it  uses 
only  one-qubit  and  two-qubit  gates  which  consume  much 
less  time.  As  a  result,  studying  the  performance  of  the  QFT 
gives  us  an  insight  into  how  the  CQLA  will  behave  when 
faced  with  an  communication  heavy  and  a  computation  light 
application. 

In  the  worst  case,  all  nine  data  qubits  (maximum  capac¬ 
ity  of  the  compute  block)  would  have  to  be  transferred  to  or 
from  memory  simultaneously. 

Between  compute  blocks,  the  QFT’s  all- to- all  personal¬ 
ized  communication  must  be  supported  on  the  CQLA  mesh 
network.  We  leverage  the  vast  amount  of  prior  work  done  in 
studying  mesh  networks,  and  employ  a  near-optimal  algo¬ 
rithm  proposed  in  [37].  The  total  time  for  communication 
for  varying  problem  sizes  is  shown  in  figure  8(b).  Note  that 
while  communication  time  is  a  little  less,  it  closely  tracks 
the  computation  time  for  all  problem  sizes.  This  is  due  to 
the  difference  in  time  to  error  correct  a  single  logical  qubit 
and  the  time  to  transport  a  single  qubit;  which  stays  constant 
immaterial  of  the  problem  size. 

7  Future  Work 

A  high-level  goal  of  this  work  is  to  build  abstractions 
from  which  architects  and  systems  designers  can  examine 
open  issues  and  help  guide  the  substantial  basic  science  and 
engineering  under  investment  towards  building  a  scalable 
quantum  computer.  The  primary  focus  of  our  work  has  been 
system  balance.  The  driving  force  in  this  balance  has  been 
application  parallelism.  A  key  open  issue  is  the  restruc¬ 
turing  of  quantum  algorithms  to  manage  this  parallelism  in 


the  context  of  system  balance.  From  an  architectural  point 
of  view,  the  most  relevant  abstract  properties  are  density  of 
functional  components,  the  memory  hierarchy  and  commu¬ 
nication  bandwidth. 

While  our  work  has  focused  on  trapped  ions,  most  scal¬ 
able  technologies  will  have  a  similar  two-dimensional  lay¬ 
out  where  our  techniques  can  be  easily  applied.  This  is  be¬ 
cause  the  density  is  determined  by  the  ratio  of  data  to  ancilla 
rather  than  physical  details  of  the  underlying  technology. 

For  ion-traps,  lasers  can  also  be  a  control  issue.  We 
plan  to  study  how  our  architecture  can  minimize  the  number 
of  lasers  and  minimize  the  power  consumed  by  each  laser, 
since  power  is  proportional  to  fanout.  Efficiently  routing 
control  signals  to  all  electrodes  in  an  ion-trap  is  a  challeng¬ 
ing  proposition,  one  that  has  not  yet  been  considered  for 
large  systems.  Currently,  we  perform  the  whole  adder  at 
the  fast  level  1  encoding  or  at  the  level  2  encoding;  clever 
instruction  scheduling  techniques  can  allow  us  to  improve 
performance  by  reducing  granularity. 

8  Conclusion 

The  technologies  and  abstractions  for  quantum  comput¬ 
ing  have  evolved  to  an  exciting  stage,  where  architects  and 
system  designers  can  attack  open  problems  without  intimate 
knowledge  of  the  physics  of  quantum  devices.  We  explore 
the  amount  of  parallelism  available  in  quantum  algorithms 
and  find  that  a  specialized  architecture  can  serve  our  needs 
very  well.  The  CQLA  design  is  an  example  where  archi¬ 
tectural  techniques  of  specialization  and  balanced  system 
design  have  led  to  up  to  a  13X  improvement  in  density  and 


72 


a  8X  increase  in  performance,  while  preserving  fault  tol¬ 
erance.  We  hope  that  further  application  of  compiler  and 
system  optimizations  will  lead  to  even  more  dramatic  gains 
towards  a  scalable,  buildable  quantum  computer. 
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Abstract — Advances  in  quantum  devices  have  brought  scal¬ 
able  quantum  computation  closer  to  reality.  We  focus  on  the 
system-level  issues  of  how  quantum  devices  can  be  brought 
together  to  form  a  scalable  architecture.  In  particular,  we  examine 
promising  silicon-based  proposals.  We  discover  that  communi¬ 
cation  of  quantum  data  is  a  critical  resource  in  such  proposals. 
We  find  that  traditional  techniques  using  quantum  SWAP  gates 
are  exponentially  expensive  as  distances  increase  and  propose 
quantum  teleportation  as  a  means  to  communicate  data  over 
longer  distances  on  a  chip.  Furthermore,  we  find  that  realistic 
quantum  error-correction  circuits  use  a  recursive  structure  that 
benefits  from  using  teleportation  for  long-distance  communi¬ 
cation.  We  identify  a  set  of  important  architectural  building 
blocks  necessary  for  constructing  scalable  communication  and 
computation.  Finally,  we  explore  an  actual  layout  scheme  for 
recursive  error  correction,  and  demonstrate  the  exponential 
growth  in  communication  costs  with  levels  of  recursion,  and  that 
teleportation  limits  those  costs. 

Index  Terms — Quantum  architecture,  quantum  computers,  sil¬ 
icon-based  quantum  computing. 


I.  Introduction 

MANY  important  problems  seem  to  require  exponential 
resources  on  a  classical  computer.  Quantum  com¬ 
puters  can  solve  some  of  these  problems  with  polynomial 
resources,  which  has  led  a  great  number  of  researchers  to 
explore  quantum  information  processing  technologies  [1]— [7] . 
Early- stage  quantum  computers  have  involved  a  small  number 
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of  components  (less  than  ten)  and  have  utilized  molecules  in 
solution  and  trapped  ions  [8]— [1 1] .  To  exploit  our  tremendous 
historical  investment  in  silicon,  however,  solid-state  silicon 
quantum  computers  are  desirable.  Promising  proposals  along 
these  lines  have  begun  to  appear  [12],  [13];  these  even  include 
ideas  which  merge  atomic  physics  and  silicon  micromachining 
[14].  However,  as  the  number  of  components  grows,  quantum 
computing  systems  will  begin  to  require  the  same  level  of 
engineering  as  current  computing  systems.  The  process  of 
architectural  design  used  for  classical  silicon-based  systems, 
of  building  abstractions  and  optimizing  structures,  needs  to  be 
applied  to  quantum  technologies. 

Even  at  this  early  stage,  a  general  architectural  study  of 
quantum  computation  is  important.  By  investigating  the  po¬ 
tential  costs  and  fundamental  challenges  of  quantum  devices, 
we  can  help  illuminate  pitfalls  along  the  way  toward  a  scalable 
quantum  processor.  We  may  also  anticipate  and  specify  impor¬ 
tant  subsystems  common  to  all  implementations,  thus  fostering 
interoperability.  Identifying  these  practical  challenges  early  will 
help  focus  the  ongoing  development  of  fabrication  and  device 
technology.  In  particular,  we  find  that  transporting  quantum 
data  is  a  critical  requirement  for  upcoming  silicon-based 
quantum  computing  technologies. 

Quantum  information  can  be  encoded  in  a  number  of  ways, 
such  as  the  spin  component  of  basic  particles  like  protons  or 
electrons,  or  in  the  polarization  of  photons.  Thus,  there  are  sev¬ 
eral  ways  in  which  we  might  transfer  information.  First,  we 
might  physically  transport  particles  from  one  point  to  another. 
In  a  large  solid-state  system,  the  logical  candidate  for  informa¬ 
tion  carriers  would  be  electrons,  since  they  are  highly  mobile. 
Unfortunately,  electrons  are  also  highly  interactive  with  the  en¬ 
vironment  and,  hence,  subject  to  corruption  of  their  quantum 
state,  a  process  known  as  decoherence.  Second,  we  might  con¬ 
sider  passing  information  along  a  line  of  quantum  devices.  This 
swapping  channel  is,  in  fact,  a  viable  option  for  short  distances 
(as  discussed  in  Section  IV),  but  tends  to  accumulate  errors  over 
long  distances. 

Over  longer  distances,  we  need  something  fundamentally  dif¬ 
ferent.  We  propose  to  use  a  technique  called  teleportation  [15] 
and  to  call  the  resulting  long-distance  quantum  wire  a  teleporta¬ 
tion  channel  to  distinguish  from  a  swapping  channel.  Telepor¬ 
tation  uses  an  unusual  quantum  property  called  entanglement , 
which  allows  quantum  information  to  be  communicated  at  a 


1077-260X/03$17.(7(fe  2003  IEEE 


COPSEY  et  al. :  TOWARD  SCALABLE,  SILICON-BASED  QUANTUM  COMPUTING  ARCHITECTURE 


1553 


distance.1  To  understand  the  mathematical  details  and  practical 
implications  of  teleportation,  we  will  need  to  cover  some  back¬ 
ground  before  returning  to  the  subject  in  Section  II-C. 

A  striking  example  of  the  importance  of  quantum  communi¬ 
cation  lies  in  the  implementation  of  error-correction  circuits. 
Quantum  computations  must  make  use  of  extremely  robust 
error-correction  techniques  to  extend  the  life  of  quantum  data. 
We  present  optimized  layouts  of  quantum  error-correction 
circuits  based  upon  quantum  bits  embedded  in  silicon. 

We  discover  two  interesting  results  from  our  quantum  lay¬ 
outs.  First,  the  recursive  nature  of  quantum  error  correction  re¬ 
sults  in  an  H- tree- structured  circuit  that  requires  long-distance 
communication  to  move  quantum  data  as  we  approach  the  root. 
Second,  the  reliability  of  the  quantum  SWAP  operator  is  perhaps 
the  most  important  operator  for  a  technology  to  implement  reli¬ 
ably  in  order  to  realize  a  scalable  quantum  computer. 

The  remainder  of  this  paper  continues  with  a  brief  introduc¬ 
tion  to  quantum  computing  in  Section  II.  We  describe  our  as¬ 
sumptions  about  implementation  technologies  in  Section  III. 
Next,  Section  IV  discusses  how  quantum  information  can  be 
transported  in  solid-state  technologies.  This  includes  a  discus¬ 
sion  of  short-distance  swapping  channels  and  the  more  scal¬ 
able  long-distance  teleportation  channels.  Section  V  introduces 
error-correction  algorithms  for  quantum  systems  and  discusses 
the  physical  layout  of  such  algorithms.  Then,  Section  VI  probes 
details  of  two  important  error-correction  codes.  Following  this, 
in  Section  VII,  we  demonstrate  the  need  for  teleportation  as  a 
long-distance  communication  mechanism  in  the  layout  of  recur¬ 
sive  error-correction  algorithms.  Finally,  Section  VIII  discusses 
system  bandwidth  issues  and  in  Section  IX  we  conclude. 

II.  Quantum  Computation 

We  begin  with  a  brief  overview  of  the  basic  terminology  and 
constructs  of  quantum  computation.  Our  purpose  is  to  introduce 
the  language  necessary  for  subsequent  sections;  in-depth  treat¬ 
ments  of  these  subjects  are  available  in  the  literature  [2]. 

A.  Quantum  States:  Qubits 

The  state  of  a  classical  digital  system  X  can  be  specified 
by  a  binary  string  x  composed  of  a  number  of  bits  X{ ,  each 
of  which  uniquely  characterizes  one  elementary  piece  of  the 
system.  For  n  bits,  there  are  2n  possible  states.  The  state  of  an 
analogous  quantum  system  ^  is  described  by  a  complex- valued 
vector  | 'll:)  =  cx  |x),  a  weighted  combination  (a  “superposi¬ 
tion”)  of  the  basis  vectors  |x),  where  the  probability  amplitudes 
cx  are  complex  numbers  whose  modulus  squared  sums  to  one, 

E,M2  =  i. 

A  single  quantum  bit  is  commonly  referred  to  as  a  qubit  and 
is  described  by  the  equation  \^)  =  co|0)  +  ci  1 1) ,  where  the  q 
are  complex  valued.  Legal  qubit  states  include  “classical”  com¬ 
putational  basis  states  |0)  and  |1),  and  states  in  superposition, 
such  as  ^|0)  +  ^|1),  or  ||0)  —  Larger  quantum  sys¬ 

tems  can  be  composed  from  multiple  qubits,  for  example,  |00), 
or  1 100)  +  \  |01)  —  1 11).  An  n-qubit  state  is  described  by  2n 

^he  speed  of  this  channel  is,  however,  limited  by  the  rate  at  which  two 
classical  bits  can  be  transmitted  from  source  to  destination,  without  which  the 
quantum  information  is  ambiguous. 


basis  vectors,  each  with  its  own  complex  probability  amplitude, 
so  an  n-qubit  system  can  exist  in  an  arbitrary  superposition  of 
the  possible  2n  classical  states  of  the  system. 

Unlike  the  classical  case,  however,  where  the  total  can  be 
completely  characterized  by  its  parts,  the  state  of  larger  quantum 
systems  cannot  always  be  described  as  the  product  of  its  parts. 
This  property,  known  as  entanglement ,  is  best  illustrated  with 
an  example:  there  exist  no  single  qubit  states  |  ip  a)  and  \^b) 
such  that  the  two-qubit  state  |4>)  =  ^|00)  +  11)  can  be 

expressed  as  the  composite  state2  \^a)  0  \^b)-  Entanglement 
has  no  classical  analogue.  It  is  what  gives  quantum  computers 
their  computational  powers. 

Although  a  quantum  system  may  exist  in  a  superposition  of 
orthogonal  states,  only  one  of  those  states  can  be  observed,  or 
measured.  After  measurement,  the  system  is  no  longer  in  su¬ 
perposition:  the  quantum  state  collapses  into  the  one  state  mea¬ 
sured,  and  the  probability  amplitude  of  all  other  states  goes  to 
zero.  For  example,  when  the  state  ^  |00)  +  -^  1 11)  is  measured, 
the  result  is  either  00  or  1 1,  with  equal  probability;  the  outcomes 
1 01)  or  1 10)  never  occur.  Furthermore,  if  a  subset  of  the  qubits 
in  a  system  is  measured,  the  remaining  qubits  are  left  in  a  state 
consistent  with  the  measurement. 

Since  measurement  of  a  quantum  system  only  produces  a 
single  result,  quantum  algorithms  must  maximize  the  proba¬ 
bility  that  the  result  measured  is  the  result  desired.  This  may 
be  accomplished  by  iteratively  amplifying  the  desired  result,  as 
in  Grover’s  fast  database  search,  0(^7®)  for  a  dataset  of  size 
n  [16].  Another  option  is  to  arrange  the  computation  such  that 
it  does  not  matter  which  of  many  random  results  is  measured 
from  a  qubit  vector.  This  method  is  used  in  Shor’s  algorithm  for 
finding  a  factor  of  a  composite  integer  [17],  [18],  which  is  built 
upon  modular  exponentiation  and  a  quantum  Fourier  transform. 
For  the  interested  reader,  quantum  algorithms  for  a  variety  of 
problems  other  than  search  and  factoring  have  been  developed: 
adiabatic  solution  of  optimization  problems  (a  quantum  ana¬ 
logue  of  simulated  annealing;  complexity  unknown)  [19],  pre¬ 
cise  clock  synchronization  (using  EPR  pairs  to  synchronize  GPS 
satellites)  [20],  [21],  quantum  key  distribution  (provably  secure 
distribution  of  classical  cryptographic  keys)  [22],  and  very  re¬ 
cently,  Gauss  sums  [23],  and  Pell’s  equation  [24]. 

B.  Quantum  Gates  and  Circuits 

Just  as  classical  bits  are  manipulated  using  gates  such  as  NOT, 
AND,  and  XOR,  qubits  are  manipulated  with  quantum  gates  such 
as  those  shown  in  Fig.  1.  A  quantum  gate  is  described  by  a 
unitary  operator  U.  The  output  state  vector  is  the  operator  ap¬ 
plied  to  the  input  vector;  that  is,  |?/w)  =  U^m)-  The  classical 
NOT  has  the  quantum  analogue  X  which  inverts  the  probabil¬ 
ities  of  measuring  0  and  1.  The  quantum  analogue  of  XOR  is 
the  two-qubit  CNOT  gate:  the  target  qubit  is  inverted  for  those 
states  where  the  source  qubit  is  1 .  Most  quantum  gates,  however, 
have  no  classical  analogue.  The  Z  gate  flips  the  relative  phase 
of  the  |1)  state,  thus  exchanging  ^(|0)  +  |1))  and  ^=(|0)  — 
|1)).  The  Hadamard  gate  H  turns  |0)  into  ^(|0)  +  |1))  and 

2The  composition  operator  for  quantum  systems  is  the  tensor  product,  ®  : 
lx>  ®  |y)  =  Ex  r>#k}  |  E,  cv\y)  =  Ex.,  cXy\x  y )>  where  X  (X)  y  is 
simply  the  string  formed  by  concatenating  x  and  y. 
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Fig.  1.  Basic  quantum  gates  and  their  matrix  representations. 


Fig.  2.  Quantum  Teleportation:  Quantum  Teleportation  of  state  |a).  First, 
entangled  qubits  \b)  and  |c)  are  distributed.  Then,  | a)  is  combined  with  | b) 
after  which  measurements  produce  two  classical  bits  of  information  (double 
lines).  After  transport,  these  bits  are  used  to  manipulate  |c)  to  regenerate  state 
|  a)  at  the  destination. 


quantum-controlled  gates.  The  symbol  0  is  shorthand  for  the 
target  qubit  of  the  CNOT  gate. 

C.  Quantum  Teleportation 

Quantum  teleportation  is  the  recreation  of  a  quantum  state  at 
a  distance,  using  only  classical  communication.  It  accomplishes 
this  feat  by  using  a  pair  of  entangled  qubits,  \^)  =  (| 00)  + 

1 11)),  called  an  EPR  pair.3 

Fig.  2  gives  an  overview  of  the  teleportation  process.  We  start 
by  generating  an  EPR  pair.  We  separate  the  pair,  keeping  one 
qubit,  1 6),  at  the  source  and  transporting  the  other,  |c),  to  the 
destination.  When  we  want  to  send  a  qubit,  |a),  we  first  interact 
|  a)  with  |  b)  using  a  CNOT  gate.  We  then  measure  the  phase  of  |  a) 
and  the  amplitude  of  \b),  send  the  two  one-bit  classical  results 
to  the  destination,  and  use  those  results  to  recreate  the  correct 
phase  and  amplitude  in  |  c)  such  that  it  takes  on  the  original  state 
of  | a).  The  recreation  of  phase  and  amplitude  is  done  with  X 
and  Z  gates,  whose  application  is  contingent  on  the  outcome 
of  the  measurements  of  | a)  and  \b).  Intuitively,  since  | c)  has  a 
special  relationship  with  |  b),  interacting  |  a)  with  |6)  makes  |  c) 
resemble  | a),  modulo  a  phase  and/or  amplitude  error.  The  two 
measurements  allow  us  to  correct  these  errors  and  recreate  |  a) 
at  the  destination.  Note  that  the  original  state  of  |  a)  is  destroyed 
when  we  take  our  two  measurements.4 

Why  bother  with  teleportation  when  we  end  up  transporting 
|  c)  anyway?  Why  not  just  transport  |a)  directly?  First,  we  can 
precommunicate  EPR  pairs  with  extensive  pipelining  without 
stalling  computations.  Second,  it  is  easier  to  transport  EPR  pairs 
than  real  data.  Since  |  b)  and  |  c)  have  known  properties,  we  can 
employ  a  specialized  procedure  known  as  purification  to  turn 
a  collection  of  pairs  partially  damaged  from  transport  into  a 
smaller  collection  of  asymptotically  perfect  pairs.  Third,  trans¬ 
mitting  the  two  classical  bits  resulting  from  the  measurements 
is  more  reliable  than  transmitting  quantum  data. 


|1)  into  -^(|0>  —  |1));  it  can  be  thought  of  as  performing  a 
radix-2  Fourier  transform.  Another  important  single-qubit  gate, 
T,  leaves  |0)  unchanged  but  multiplies  |1)  by  s/i.  Single  qubit 
gates  are  characterized  by  a  rotation  around  an  axis:  X  rotates 
the  qubit  by  7r  around  the  f-axis;  Z  rotates  by  i r  around  the 
i-axis;  and  T  rotates  by  7r/4  around  the  z  axis.  By  composing 
the  T  and  H  gates,  any  single-qubit  gate  can  be  approximated 
to  arbitrary  precision.  The  combination  of  T,  H ,  and  CNOT  pro¬ 
vide  a  universal  set :  just  as  any  Boolean  circuit  can  be  com¬ 
posed  from  AND,  OR,  and  NOT  gates,  any  polynomially  describ- 
able  multiqubit  quantum  transform  U  can  be  efficiently  approx¬ 
imated  by  composing  just  these  three  quantum  gates  into  a  cir¬ 
cuit. 

One  additional  important  operator  is  the  SWAP  gate.  Just 
as  two  classical  values  can  be  swapped  using  three  xors,  a 
quantum  SWAP  can  be  implemented  as  three  CNOTs.  However, 
SWAP  is  often  available  natively  for  a  given  technology,  which 
is  valuable,  given  its  importance  to  quantum  communication. 

Fig.  2  shows  a  quantum  circuit  for  teleportation  (described 
in  the  next  section).  In  quantum  circuits,  time  goes  from  left 
to  right,  where  single  lines  represent  qubits,  and  double  lines 
represent  classical  bits.  A  meter  is  used  to  represent  measure¬ 
ment.  By  convention,  black  dots  represent  control  terminals  for 


III.  Solid-State  Technologies 

With  some  basics  of  quantum  operations  in  mind,  we  turn 
our  attention  to  the  technologies  available  to  implement  these 
operations.  Experimentalists  have  examined  several  technolo¬ 
gies  for  quantum  computation,  including  trapped  ions  [26], 
photons  [27],  bulk  spin  NMR  [28],  Josephson  junctions  [13], 
[29],  SQUIDS  [30],  electron  spin  resonance  transistors  [31], 
and  phosphorus  nuclei  in  silicon  (the  “Kane”  model)  [12],  [32]. 
Of  these  proposals,  only  the  last  three  build  upon  a  solid-state 
platform;  they  are  generally  expected  to  provide  the  scalability 
required  to  achieve  a  truly  scalable  computational  substrate. 

For  the  purposes  of  this  paper,  the  key  feature  of  these 
solid-state  platforms  are  as  follows. 

1)  Quantum  bits  are  laid  out  in  silicon  in  a  two-dimensional 
(2-D)  fashion,  similar  to  traditional  CMOS  VLSI. 

2)  Quantum  interactions  are  near-neighbor  between  bits. 

3An  EPR  or  Einstein-Podolsky-Rosen  pair  is  a  special  instance  of  entangle¬ 
ment  noted  in  the  Einstein-Podolsky-Rosen  paradox  [25],  [62]. 

4This  is  consistent  with  the  no-cloning  theorem,  which  states  that  an  arbitrary 
quantum  state  cannot  be  perfectly  copied;  this  is  fundamentally  because  of  the 
unitarity  of  quantum  mechanics. 
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Fig.  3.  The  basic  quantum  bit  technology  proposed  by  Kane  [34],  Qubits  are 
embodied  by  the  nuclear  spin  of  a  phosphorus  atom  coupled  with  an  electron 
embedded  in  silicon  under  high  magnetic  field  at  low  temperature. 


3)  Quantum  bits  cannot  move  physically,  but  quantum  data 
can  be  swapped  between  neighbors. 

4)  The  control  structures  necessary  to  manipulate  the  bits 
prevent  a  dense  2-D  grid  of  bits.  Instead,  we  have  linear 
structures  of  bits  which  can  cross,  but  there  is  a  minimum 
distance  between  such  intersections  that  is  on  the  order  of 
20  bits  for  our  primary  technology  model  [33].  This  re¬ 
striction  is  similar  to  a  “design  rule”  in  traditional  CMOS 
VLSI. 

These  four  assumptions  apply  to  several  solid-state  technolo¬ 
gies.  For  concreteness,  we  will  focus  upon  an  updated  version  of 
Kane’s  phosphorus-in-silicon  nuclear-spin  proposal  [34].  This 
scheme  will  serve  as  an  example  for  the  remainder  of  the  paper, 
although  we  will  generalize  our  results  when  appropriate. 

Fig.  3  illustrates  important  dimensions  of  the  Kane  scheme. 
Shown  are  two  phosphorus  atoms  spaced  15-100  nm  apart. 
Quantum  states  are  stored  in  relatively  stable  electron-donor 
(e~  — 31P+)  spin  pairs,  where  the  electron  (e)  and  the  donor 
nucleus  (n)  have  opposite  spins.  The  basis  states,  |0)  and  |1) 
are  defined  as  the  superposition  states  |0)  =  ||e|n)  +  lletn) 
and  |1)  =  ITeln)  —  lletn)-  Twenty  nanometers  above  the 
phosphorus  atoms  lie  three  classical  control  wires,  one  A  gate 
and  two  S  gates.  Precisely  timed  pulses  on  these  gates  provide 
arbitrary  one-  and  two-qubit  quantum  gates. 

Single  qubit  operators  are  composed  of  pulses  on  the  A-gates, 
modulating  the  hyperfine  interaction  between  electron  and  nu¬ 
cleus  to  provide  Z  axis  rotations.  A  globally  applied  static  mag¬ 
netic  field  provides  rotations  around  the  X  axis.  By  changing  the 
pulse  widths,  any  desired  rotational  operator  may  be  applied, 
including  the  identity  operator.  Two-qubit  interactions  are  me¬ 
diated  by  S -gates,  which  move  an  electron  from  one  nucleus  to 
the  next.  The  exact  details  of  the  pulses  and  quantum  mechanics 
of  this  technique  are  beyond  the  scope  of  this  paper  and  are  de¬ 
scribed  in  [34] . 

Particularly  apropos  to  the  next  few  sections  of  this  paper, 
however,  is  the  interqubit  spacing  of  15-100  nm.  The  exact 
spacing  is  currently  a  topic  of  debate  within  the  physics  commu¬ 
nity,  with  conservative  estimates  of  15  nm,  and  more  aggressive 
estimations  of  100  nm.  The  tradeoff  is  between  noise  immunity 


and  difficulty  of  manufacturing.  For  our  study,  we  will  use  a 
figure  (60  nm)  that  lies  between  these  two.  This  choice  implies 
that  the  A  and  S  gates  are  spaced  20  nm  apart.  We  parameterize 
our  work,  however,  to  generalize  for  changes  in  the  underlying 
technology. 

The  Kane  proposal,  like  all  quantum  computing  proposals, 
uses  classical  signals  to  control  the  timing  and  sequence  of  op¬ 
erations.  All  known  quantum  algorithms,  including  basic  error- 
correction  for  quantum  data,  require  the  determinism  and  reli¬ 
ability  of  classical  control.  Without  efficient  classical  control, 
fundamental  results  demonstrating  the  feasibility  of  quantum 
computation  do  not  apply  (such  as  the  Threshold  Theorem  used 
in  Section  IV-B.3). 

Quantum  computing  systems  display  a  characteristic  tension 
between  computation  and  communication.  Fundamentally, 
technologies  that  transport  data  well  do  so  because  they  are 
resistant  to  interaction  with  the  environment  or  other  quantum 
bits;  on  the  other  hand  technologies  that  compute  well  do  so 
precisely  because  they  do  interact.  Thus,  computation  and 
communication  are  somewhat  at  odds. 

In  particular,  atomic-based  solid-state  technologies  are  good 
at  providing  scalable  computation  but  complicate  communica¬ 
tion,  because  their  information  carriers  have  nonzero  mass.  The 
Kane  proposal,  for  example,  represents  a  quantum  bit  with  the 
nuclear  spin  of  a  phosphorus  atom  implanted  in  silicon.  The 
phosphorus  atom  does  not  move,  hence,  transporting  this  state 
to  another  part  of  the  chip  is  laborious  and  requires  carefully 
controlled  swapping  of  the  states  of  neighboring  atoms.  In  con¬ 
trast,  photon-based  proposals  that  use  polarization  to  represent 
quantum  states  can  easily  transport  data  over  long  distances 
through  fiber.  It  is  very  difficult,  however,  to  get  photons  to  in¬ 
teract  and  achieve  any  useful  computation.  Furthermore,  trans¬ 
ferring  quantum  states  between  atomic-  and  photon-based  tech¬ 
nologies  is  currently  extremely  difficult. 

Optimizing  these  tensions,  between  communication  and  com¬ 
putation,  between  classical  control  and  quantum  effects,  im¬ 
plies  a  structure  to  quantum  systems.  In  this  paper,  we  begin 
to  examine  this  optimization  by  focusing  on  communication  in 
solid-state  quantum  systems.  Specifically,  we  begin  by  exam¬ 
ining  the  quantum  equivalent  of  short  and  long  “wires.” 

IV.  Transporting  Quantum  Information:  Wires 

In  this  section,  we  explore  the  difficulty  of  transporting 
quantum  information  within  a  silicon  substrate.  Any  optimistic 
view  of  the  future  of  quantum  computing  includes  enough 
interacting  devices  to  introduce  a  spatial  extent  to  the  layout 
of  those  devices.  This  spatial  dimension,  in  turn,  introduces  a 
need  for  wires.  One  of  the  most  important  distinctions  between 
quantum  and  classical  wires  arises  from  the  no-cloning  theorem 
[2]  is  that  quantum  information  cannot  be  copied  but  must 
rather  be  transported  from  source  to  destination  (see  footnote 

4). 

Section  IV- A  begins  with  a  relatively  simple  means  of 
moving  quantum  data  via  swap  operations,  called  a  swapping 
channel.  Unfortunately,  the  analysis  of  Section  IV-B  indicates 
that  swapping  channels  do  not  scale  well,  leading  to  an  al¬ 
ternative  called  a  teleportation  channel.  This  long-distance 
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Fig.  4.  Short  wires  are  constructed  from  successive  qubits  (phosphorus  atoms). 
Information  in  the  quantum  data  path  is  swapped  from  qubit  to  qubit  under 
classical  control.  A  single  SWAP  operator  requires  multiple  A-  and  S-gate  voltage 
pulses.  The  control  circuitry  is  not  to  scale. 


10mm  access  points  contain 
only  a  handful  of  quantum 
states  for  their  electrons  at 
temperatures  less  than  1 K, 
preventing  correct 
operation. 


As  two  physical  dimensions  of  the  access  point 
exceed  lOOnm  thousands  of  electron  states  are  held. 


Classically,  electron  states  are 
restricted  to  the  access  point; 
some  electrons  will,  however, 
scatter  into  the  narrow  wire  and 
move  ballistically  downward, 
enabling  proper  control. 


technology  is  introduced  in  Section  IV-C  and  analyzed  in 
Section  IV-D. 

A.  Short  Wires:  Swapping  Channel 

In  solid-state  technologies,  a  line  of  qubits  is  one  plausible 
approach  to  transporting  quantum  data.  Fig.  4  provides  a 
schematic  of  a  swapping  channel  in  which  information  is 
progressively  swapped  between  pairs  of  qubits  in  the  quantum 
datapath — somewhat  like  a  bubble  sort.5  Swapping  channels 
require  active  control  from  classical  logic  as  illustrated  by  the 
classical  control  plane  of  Fig.  4. 

As  simple  as  it  might  appear,  a  quantum  swapping  channel 
presents  significant  technical  challenges.  The  first  hurdle  is  the 
placement  of  the  phosphorus  atoms  themselves.  The  leading 
work  in  this  area  has  involved  precise  ion  implantation  through 
masks,  and  manipulation  of  single  atoms  on  the  surface  of  sil¬ 
icon  [35].  For  applications  where  only  a  few  trial  devices  are  de¬ 
sired,  slowly  placing  a  few  hundred  thousand  phosphorus  atoms 
with  a  probe  device  [36]  may  be  possible.  For  bulk  manufac¬ 
turing,  the  advancement  of  DNA-based  or  other  chemical  self- 
assembly  techniques  [37]  may  need  to  be  developed.  Note  that, 
while  new  technologies  may  be  developed  to  enable  precise 
placement,  the  key  for  our  work  is  only  the  spacing  (60  nm) 
of  the  phosphorus  atoms  themselves,  and  the  number  of  control 
lines  (three)  per  qubit.  The  relative  scale  of  quantum  interaction 
and  the  classical  control  of  these  interactions  is  what  will  lead 
our  analysis  to  the  fundamental  constraints  on  quantum  com¬ 
puting  architectures. 

A  second  challenge  is  the  scale  of  classical  control.  Each  con¬ 
trol  line  into  the  quantum  datapath  is  roughly  10  nm  in  width. 
While  such  wires  are  difficult  to  fabricate,  we  expect  that  either 
electron  beam  lithography  [38],  or  phase- shifted  masks  [39]  will 
make  such  scales  possible. 

A  remaining  challenge  is  the  temperature  of  the  device.  In 
order  for  the  quantum  bits  to  remain  stable  for  a  reasonable 
period  of  time  the  device  must  be  cooled  to  less  than  one  de¬ 
gree  Kelvin.  The  cooling  itself  is  straightforward,  but  the  ef- 

5For  technologies  that  do  not  have  an  intrinsic  swap  operation,  one  can  be 
implemented  by  three  CONTROLLED-NOT  gates  performed  in  succession.  This  is 
a  widely  known  result  in  the  quantum  computing  field  and  we  refer  the  interested 
reader  to  [2]. 


Fig.  5.  Quantization  of  electron  states  overcome  by  increasing  the  physical 
dimension  of  the  control  lines  beyond  100  nm.  The  states  propagate 
quantum-mechanically  downward  through  access  vias  to  control  the  magnetic 
field  around  the  phosphorus  atoms. 

feet  of  the  cooling  on  the  classical  logic  is  a  problem.  Two  is¬ 
sues  arise.  First,  conventional  transistors  stop  working  as  the 
electrons  become  trapped  near  their  dopant  atoms,  which  fail  to 
ionize.  Second,  the  10-nm  classical  control  lines  begin  to  ex¬ 
hibit  quantum-mechanical  behavior,  such  as  conductance  quan¬ 
tization  and  interference  from  ballistic  transport  [40] . 

Fortunately,  many  researchers  are  already  working  on 
low-temperature  transistors.  For  instance,  single-electron 
transistors  (SETs)  [41]  are  the  focus  of  intense  research  due  to 
their  high  density  and  low  power  properties.  SETs,  however, 
have  been  problematic  for  conventional  computing  because 
they  are  sensitive  to  noise  and  operate  best  at  low  temperatures. 
For  quantum  computing,  this  predilection  for  low  temperatures 
is  exactly  what  is  needed!  Tucker  and  Shen  describe  this 
complementary  relationship  and  propose  several  fabrication 
methods  in  [42]. 

On  the  other  hand,  the  quantum-mechanical  behavior  of  the 
control  lines  presents  a  subtle  challenge  that  has  been  mostly 
ignored  to-date.  At  low  temperatures,  and  in  narrow  wires,  the 
quantum  nature  of  electrons  begins  to  dominate  over  normal 
classical  behavior.  For  example,  in  100-nm-wide  poly  sil¬ 
icon  wires  at  100  mK,  electrons  propagate  ballistically  like 
waves,  through  only  one  conductance  channel,  which  has  an 
impedance  given  by  the  quantum  of  resistance,  h/e 2  «  25  kQ. 
Impedance  mismatches  to  these  and  similar  metallic  wires 
make  it  impossible  to  properly  drive  the  ac  current  necessary  to 
perform  qubit  operations,  in  the  absence  of  space-consuming 
impedance  matching  structures  such  as  adiabatic  tapers. 

Avoiding  such  limitations  mandates  a  geometric  design  con¬ 
straint:  narrow  wires  must  be  short  and  locally  driven  by  nearby 
wide  wires.  Using  100  nm  as  a  rule  of  thumb6  for  a  minimum 
metallic  wire  width  sufficient  to  avoid  undesired  quantum  be¬ 
havior  at  these  low  temperatures,  we  obtain  a  control  gate  struc¬ 
ture  such  as  that  depicted  in  Fig.  5.  Here,  wide  wires  terminate  in 
10-nm  vias  that  act  as  local  gates  above  individual  phosphorus 
atoms. 

6This  value  is  based  on  typical  electron  mean  free  path  distances,  given  known 
scattering  rates  and  the  electron  Fermi  wavelength  in  metals. 
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Classical  control  access  points 


Fig.  6.  A  linear  row  of  quantum  bits:  In  this  figure  (not  drawn  to  scale)  we  depict  access  control  for  a  line  of  quantum  bits.  On  the  left,  we  depict  a  “top  down” 
view.  On  the  right  is  a  vertical  cross-section  which  more  clearly  depicts  the  narrow-tipped  control  lines  that  quickly  expand  to  classical  dimensions. 


Fig.  7.  Intersection  of  quantum  bits.  In  this  simplified  view,  we  depict 
a  four- way  intersection  of  quantum  bits.  An  inversely  (diamond  shaped) 
organized  junction  is  also  needed  to  densely  pack  junction  cells. 

Producing  a  line  of  quantum  bits  that  overcomes  all  of  the 
above  challenges  is  possible.  We  illustrate  a  design  in  Fig.  6. 
Note  how  access  lines  quickly  taper  into  upper  layers  of  metal 
and  into  control  areas  of  a  classical  scale.  These  control  areas 
can  then  be  routed  to  access  transistors  that  can  gate  on  and  off 
the  frequencies  (in  the  10s  to  100s  of  MHz)  required  to  apply 
specific  quantum  gates. 

Of  course,  any  solution  for  data  transport  must  also  support 
routing.  Routing  is  not  possible  without  fanout  provided  by  wire 
intersections.  We  can  extend  our  linear  row  of  quantum  bits  to 
a  four-way  intersection  capable  of  supporting  sparsely  inter¬ 
secting  topologies  of  quantum  bits.  We  illustrate  the  quantum 
intersection  in  Fig.  7.  This  configuration  is  similar  to  Fig.  6  ex¬ 
cept  that  the  intersection  creates  a  more  challenging  tapering. 

B.  Analysis  of  the  Swapping  Channel 

We  now  analyze  our  swapping  channel  to  derive  two 
important  architectural  constraints:  the  classical-quantum 
interface  boundary  and  the  latency-bandwidth  characteristics. 
We  strive  to  achieve  a  loose  lower  bound  on  these  constraints 
for  a  given  quantum  device  technology.  While  future  quantum 
technologies  may  have  different  precise  numbers,  it  is  almost 
certain  they  will  continue  to  be  classically  controlled  and,  thus, 


also  obey  similar  constraints  based  upon  this  classical-quantum 
interface. 

1 )  Pitch  Matching:  Our  first  constraint  is  derived  from  the 
need  to  have  classical  control  of  our  quantum  operations.  As 
previously  discussed,  we  need  a  minimum  wire  width  to  avoid 
quantum  effects  in  our  classical  control  lines.  Referring  back  to 
Fig.  7,  we  can  see  that  each  quadrant  of  our  four- way  intersec¬ 
tion  will  need  to  be  some  minimum  size  to  accommodate  access 
to  our  control  signals. 

Recall  from  Fig.  3  that  each  qubit  has  three  associated  control 
signals  (one  A  and  two  S  gates).  Each  of  these  control  lines  must 
expand  from  a  thin  10  nm  tip  into  a  100  nm  access  point  in  an 
upper  metal  layer  to  avoid  the  effects  of  charge  quantization  at 
low  temperatures  (Fig.  5).  Given  this  structure,  it  is  possible  to 
analytically  derive  the  minimum  width  of  a  line  of  qubits  and  its 
control  lines,  as  well  as  the  size  of  a  four- way  intersection.  For 
this  minimum  size  calculation,  we  assume  all  classical  control 
lines  are  routed  in  parallel,  albeit  spread  across  the  various  metal 
layers.  This  parallel  nature  makes  this  calculation  trivial  under 
normal  circumstances  (sufficiently  “large”  lithographic  feature 
size  Ac),  with  the  minimum  line  segment  being  equal  in  length 
to  twice  the  classical  pitching,  150  nm  in  our  case,  and  the  junc¬ 
tion  size  equal  to  four  times  the  classical  pitching,  400  nm,  in 
size.  However,  we  illustrate  the  detailed  computation  to  make 
the  description  of  the  generalization  clearer.  We  begin  with  a 
line  of  qubits. 

Let  N  be  the  number  of  qubits  along  the  line  segment.  Since 
there  are  three  gates  (an  A  and  two  S  lines),  we  need  to  fit  in 
3 N  classical  access  points  of  100  nm  in  dimension  each  in  line 
width.  We  accomplish  this  by  offsetting  the  access  points  in  the 
x  and  y  dimensions  (Fig.  6)  by  20  nm.  The  total  size  of  these 
offsets  will  be  100  nm  divided  by  the  qubit  spacing  60  nm  times 
the  number  of  control  lines  per  qubit  (three),  times  the  offset 
distance  of  20  nm.  This  number  100  nm/60  nm  x  3  x  20  nm  = 
100  nm  is  divided  by  2  because  the  access  lines  are  spread  out 
on  each  side  of  the  wire.  Hence,  the  minimum  line  segment  will 
be  100  nm  +  50  nm.  Shorter  line  segments  within  larger,  more 
specialized  cells  are  possible. 

Turning  our  attention  to  an  intersection  (Fig.  7),  let  N  be  the 
number  of  qubits  along  each  “spoke”  of  the  junction.  We  need 
to  fit  3 TV  classical  access  points  in  a  space  of  (60  nm  xTV)2, 
where  each  access  point  is  at  least  100  nm  on  a  side.  As  with 
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the  case  of  a  linear  row  of  bits,  a  20-nm  x  and  y  shift  in  access 
point  positioning  between  layers  is  used  for  via  access.  Starting 
with  a  single  access  pad  of  100  nm,  we  must  fit  100  nm/60  nm 
x  3  additional  pads  shifted  in  x  and  y  within  the  single  quad¬ 
rant  of  our  intersection.  This  leads  to  a  quadrant  size  of  100  + 
100  nm/60  nm  x  3  x  20  nm  =  200  nm.  Therefore,  the  minimum 
size  four- way  intersection  is  eight  (rounding  up)  qubits  in  each 
direction. 

In  this  construction,  we  have  assumed  a  densely  packed  edge 
to  each  spoke.  However,  this  is  easily  “unpacked”  with  a  spe¬ 
cialized  line  segment,  or  by  joining  to  another  junction  that  is 
constructed  inversely  from  that  shown  in  Fig.  7.  Obviously,  the 
specific  sizes  will  vary  according  to  technological  parameters 
and  assumptions  about  control  logic,  but  this  calculation  illus¬ 
trates  the  approximate  effect  of  what  appears  to  be  a  funda¬ 
mental  tension  between  quantum  operations  and  the  classical 
signals  that  control  them.  A  minimum  intersection  size  implies 
minimum  wire  lengths,  which  imply  a  minimum  size  for  com¬ 
putation  units. 

2)  Technology  Independent  Limits:  Thus  far,  we  have  fo¬ 
cused  our  discussion  on  a  particular  quantum  device  technology. 
This  has  been  useful  to  make  the  calculations  concrete.  Nev¬ 
ertheless,  it  is  useful  to  generalize  these  calculations  to  future 
quantum  device  technologies.  Therefore,  we  parameterize  our 
discussion  based  on  a  few  device  characteristics  as  follows. 

Assuming  2-D  devices  (i.e.,  not  a  cube  of  quantum  bits),  let 
pc  be  the  classical  pitching  required,  and  pq  the  quantum  one. 
Furthermore,  let  R  be  the  ratio  pc/pq  of  the  classical  to  quantum 
distance  for  the  device  technology,  m  be  the  number  of  classical 
control  lines  required  per  quantum  bit,  and  finally  Ac  be  the  fea¬ 
ture  size  of  the  lithographic  technology.  We  use  two  separate 
variables  pc  and  Ac  to  characterize  the  “classical”  technology 
because  they  arise  from  different  physical  constraints.  The  pa¬ 
rameter  Ac  comes  from  the  lithographic  feature  size,  while  pc 
(which  is  a  function  of  Ac)  is  related  to  the  charge  quantization 
effect  of  electrons  in  gold.  With  the  Kane  technology  we  assume 
a  spacing  pq  of  60  nm  between  qubits,  three  control  lines  per  bit 
of  100  nm  (pc)  each,  and  a  Ac  of  5  nm.  We  can  use  these  to 
generalize  our  pitch  matching  equations.  Here,  we  find  that  the 
minimum  line  segment  is  simply  equivalent  to  72(1  +  2 A cm/pq) 
qubits  in  length. 

Examining  our  junction  structure  (Fig.  7),  we  note  that  it  is 
simply  four  line  segments,  similar  to  those  calculated  above, 
except  that  the  control  lines  must  be  on  the  same  side.  Therefore, 
the  minimum  crossing  size  of  quantum  bits  in  a  2-D  device  is  of 
size  «  272(1  +  4A cm/pq)  on  a  side. 

3)  Latency  and  Bandwidth:  Calculating  the  latency  and 
bandwidth  of  quantum  wires  is  similar  to  but  slightly  different 
than  it  is  for  classical  systems.  The  primary  difficulty  is 
decoherence  (i.e.,  quantum  noise).  Unlike  classical  systems, 
if  you  want  to  perform  a  quantum  computation,  you  cannot 
simply  resend  quantum  information  when  an  error  is  detected. 
The  no-cloning  theorem  prohibits  transmission  by  duplication, 
thereby  making  it  impossible  to  retransmit  quantum  informa¬ 
tion  if  it  is  corrupted.  Once  the  information  is  destroyed  by  the 
noisy  channel,  you  have  to  start  the  entire  computation  over 
(“no-cloning”  also  implies  no  checkpointing  of  intermediate 
states  in  a  computation).  To  avoid  this  loss,  qubits  are  encoded 


in  a  sufficiently  strong  error-correcting  code  that,  with  high 
probability,  will  remain  coherent  for  the  entire  length  of  the 
quantum  algorithm.  Unfortunately,  quantum  systems  will  likely 
be  so  error-prone  that  they  will  probably  execute  right  at  the 
limits  of  their  error  tolerance  [43]. 

Our  goal  is  to  provide  a  quantum  communication  layer  which 
sits  below  higher  level  error-correction  schemes.  Later,  in  Sec¬ 
tion  VIII,  we  discuss  the  interaction  of  this  layer  with  quantum 
error  correction  and  algorithms.  Consequently,  we  start  our  cal¬ 
culation  by  assuming  a  channel  with  no  error  correction.  Then, 
we  factor  in  the  effects  of  decoherence  and  derive  a  maximum 
wire  length  for  our  line  of  qubits. 

Recall  that  data  traverses  the  line  of  qubits  with  SWAP  gates, 
each  of  which  takes  approximately  1  ps  to  execute  in  the  Kane 
technology.  Hence,  to  move  quantum  information  over  a  space 
of  60  nm  requires  0.57  ps.  A  single  row  of  quantum  bits  has 
latency 


/latency  —  ^qubits  X  1  pS 


(1) 


where  <7qubits  is  the  distance  in  qubits,  or  the  physical  dis¬ 
tance  divided  by  60  nm.  This  latency  can  be  quite  large. 
A  short  1  pm  has  a  latency  of  17  ps.  On  the  plus  side,  the 
wire  can  be  fully  pipelined  and  has  a  sustained  bandwidth  of 
1/1  ps  =  1  one  million  quantum  bits  per  second  (Mqbps). 
This  may  seem  small  compared  to  a  classical  wire,  but  keep  in 
mind  that  quantum  bits  can  enable  algorithms  with  exponential 
speedup  over  the  classical  case. 

The  number  of  error-free  qubits  is  actually  lower  than  this 
physical  bandwidth.  Noise,  or  decoherence,  degrades  quantum 
states  and  makes  the  true  bandwidth  of  our  wire  less  than  the 
physical  quantum  bits  per  second.  Bits  decohere  over  time,  so 
longer  wires  will  have  a  lower  bandwidth  than  shorter  ones. 

The  stability  of  a  quantum  bit  decreases  with  time  (much  like 
an  uncorrected  classical  bit)  as  a  function  e~kt.  Usually,  a  nor¬ 
malized  form  of  this  equation  is  used,  e~Xt ,  where  t  in  this  new 
equation  is  the  number  of  operations  and  A  is  related  to  the  time 
per  operation  and  the  original  k.  As  quantum  bits  traverse  the 
wire,  they  arrive  with  a  fidelity  that  varies  inversely  with  latency, 
namely 


fidelity  =  e_Atlatency.  (2) 

The  true  bandwidth  is  proportional  to  the  fidelity 

bandwidthtrue  =  bandwidthphySicai  x  fidelity.  (3) 

Choosing  a  reasonable7  value  of  A  =  10-6,  we  find  the  true 
bandwidth  of  a  wire  to  be 

_ g  10  (4) 

1  ps 

which  for  a  1  pm  wire  is  close  to  the  ideal  (999  983  qbps). 

This  does  not  seem  to  be  a  major  effect,  until  you  consider 
an  entire  quantum  algorithm.  Data  may  traverse  back  and  forth 

7This  value  for  A  is  calculated  from  a  decoherence  rate  of  1 0  _6  per  operation, 
where  each  operation  requires  1  jus.  It  is  aggressive,  but  potentially  achievable 
with  phosphorus  atoms  in  silicon  [32],  [44]. 
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Fig.  8.  Architecture  for  a  Quantum  Wire:  Solid  double  lines  represent  classical  communication  channels,  while  chained  links  represented  a  quantum  swapping 
channel.  Single  lines  depict  the  direction  in  which  the  swapping  channel  is  being  used  for  transport. 


across  a  quantum  wire  millions  of  times.  It  is  currently  esti¬ 
mated  [45]  that  a  degradation  of  fidelity  more  than  10 _4  makes 
arbitrarily  long  quantum  computation  theoretically  unsustain¬ 
able,  with  the  practical  limit  being  far  higher  [43].  This  limit  is 
derived  from  the  Threshold  Theorem,  which  relates  the  deco¬ 
herence  of  a  quantum  bit  to  the  complexity  of  correcting  this 
decoherence  (as  discussed  in  detail,  in  Section  V)  [45]-[47].8 
Given  our  assumptions  about  A,  the  maximum  theoretical  wire 
distance  is  about  6  fi m. 

4)  Technology  Independent  Metrics:  Our  latency  and  band¬ 
width  calculations  require  slightly  more  device  parameters.  Let 
tgwap  be  the  time  per  basic  SWAP  operation.  Some  technologies 
will  have  an  intrinsic  SWAP,  and  others  will  require  synthesizing 
the  SWAP  from  3  CNOT  operations.  Let  A  be  the  decoherence  rate, 
which  for  small  A  and  tswap  is  equivalent  to  the  decoherence  a 
quantum  bit  undergoes  in  a  unit  of  operation  time  £swap.  This 
makes  the  latency  of  a  swapping  channel  wire  equal  to 

^latency  —  ^qubits^swap  (5) 

where  the  distance  dqubits  is  expressed  in  the  number  of  qubits. 
The  bandwidth  is  proportional  to  the  fidelity  or: 

bandwidthtrue  =  — - — e_A^qubits.  (6) 

4wap 

This  bandwidth  calculation  is  correct  so  long  as  the  fidelity 
remains  above  the  critical  threshold  C  ~  10-4  required  for 
fault  tolerant  computation.  Finally,  the  maximum  distance  of 
this  swapping  channel  is  the  distance  when  the  fidelity  drops 
below  the  critical  threshold 


^qubits 


ln(l  -  C) 
-A  ' 


(7) 


No  amount  of  error  correction  will  be  robust  enough  to 
support  a  longer  wire,  while  still  supporting  arbitrarily  long 
quantum  computation.  For  this,  we  need  a  more  advanced 
architecture.  One  obvious  option  is  to  break  the  wire  into 

8By  “practical,”  we  mean  without  an  undue  amount  of  error  correction.  The 
threshold  theorem  ensures  that,  theoretically,  we  can  compute  arbitrarily  long 
quantum  computations,  but  the  practical  overhead  of  error  correction  makes  the 
real  limit  2-3  orders  of  magnitude  higher  [43]. 


segments  and  insert  “repeaters”  in  the  middle.  These  quantum 
repeaters  are  effectively  performing  state  restoration  (error 
correction).  However,  we  can  do  better,  which  is  the  subject  of 
the  next  section. 

C.  Long  Wires :  Teleportation  Channel 

In  this  section,  we  introduce  an  architecture  for  quantum  com¬ 
munication  over  longer  distances  in  solid-state  technologies, 
shown  in  Fig.  8.  This  architecture  makes  use  of  the  quantum 
primitive  of  teleportation  (described  earlier  in  Section  II-C).  In 
the  next  few  sections,  we  provide  a  brief  introduction  to  the  core 
components  of  this  architecture. 

Although  teleportation  and  the  mechanisms  described  in  this 
section  are  known  in  the  literature,  what  has  been  missing  is 
the  identification  and  analysis  of  which  mechanisms  form  fun¬ 
damental  building  blocks  of  a  realistic  system.  In  this  section, 
we  highlight  three  important  architectural  building  blocks:  the 
entropy  exchange  unit ,  the  EPR  generator ,  and  the  purification 
unit.  Note  that  the  description  of  theses  blocks  is  quasi-classical 
in  that  it  involves  input  and  output  ports.  Keep  in  mind,  how¬ 
ever,  that  all  operations  (except  measurement)  are  inherently  re¬ 
versible,  and  the  specification  of  input  and  output  ports  merely 
provides  a  convention  for  understanding  the  forward  direction 
of  computation. 

1 )  Entropy  Exchange  Unit:  The  physics  of  quantum  compu¬ 
tation  requires  that  operations  are  reversible  and  conserve  en¬ 
ergy.  The  initial  state  of  the  system,  however,  must  be  created 
somehow.  We  need  to  be  able  to  create  |0)  states.  Furthermore, 
decoherence  causes  qubits  to  become  randomized — the  entropy 
of  the  system  increases  through  qubits  coupling  with  the  ex¬ 
ternal  environment. 

Where  do  these  zero  states  come  from?  The  process  can  be 
viewed  as  one  of  thermodynamic  cooling.  “Cool”  qubits  are  dis¬ 
tributed  throughout  the  processor,  analogous  to  a  ground  plane 
in  a  conventional  CMOS  chip.  The  “cool”  qubits  are  in  a  nearly 
zero  state.  They  are  created  by  measuring  the  qubit,  and  in¬ 
verting  if  |1).  The  measurement  process  itself  requires  a  source 
of  cold  spin-polarized  electrons  (created,  for  example,  using  a 
standard  technique  known  as  optical  pumping  [44],  [48]). 

As  with  all  quantum  processes,  the  measurement  operation  is 
subject  to  failure  but,  with  high  probability,  leaves  the  measured 
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Fig.  9.  Quantum  EPR  generator.  Solid  double  lines  represent  classical 
communication  (or  control),  and  single  lines  depict  quantum  wires. 
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Fig.  10.  Quantum  purification  unit.  EPR  states  are  sufficiently  regular  that  they 
can  be  purified  at  the  ends  of  a  teleportation  channel. 


qubit  in  a  known  state  from  which  |0)s  may  be  obtained.  To  ar¬ 
bitrarily  increase  this  probability  (and  make  an  extremely  cold 
zero  state)  we  can  use  a  technique  called  purification.  Specif¬ 
ically,  one  realization  employs  an  efficient  algorithm  for  data 
compression  [49],  [50]  that  gathers  entropy  across  a  number  of 
qubits  into  a  small  subset  of  high-entropy  qubits.  As  a  result,  the 
remaining  qubits  are  reinitialized  to  the  desired  pure,  |0)  state. 

2)  EPR  Generator:  Constructing  an  EPR  pair  of  qubits  is 
straightforward.  We  start  with  two  |0)  state  qubits  from  our  en¬ 
tropy  exchange  unit.  A  Hadamard  gate  is  applied  to  the  first  of 
these  qubits.  We  then  take  this  transformed  qubit  that  is  in  an 
equal  superposition  of  a  zero  and  a  one  state  and  use  it  as  the 
control  qubit  for  a  CNOT  gate.  The  target  qubit  that  is  to  be  in¬ 
verted  is  the  other  fresh  |0)  qubit  from  the  entropy  exchange 
unit.  A  CNOT  gate  is  a  qubit  like  a  classical  XOR  gate  in  that  the 
target  qubit  is  inverted  if  the  control  qubit  is  in  the  |1)  state. 
Using  a  control  qubit  of  ^(|0)  +  |1))  and  a  target  qubit  of  |0) 
we  end  up  with  a  two-qubit  entangled  state  of  ^=(|00)  +  1 11)): 
an  EPR  pair. 

The  overall  process  of  EPR  generation  is  depicted  in  Fig.  9. 
Schematically,  the  EPR  generator  has  a  single  quantum  input 
and  two  quantum  outputs.  The  input  is  directly  piped  from  the 
entropy  exchange  unit  and  the  output  is  the  entangled  EPR  pair. 

3 )  EPR  Purification  Unit :  The  final  building  block  we  re¬ 
quire  is  the  EPR  purification  unit.  This  unit  takes  as  input  n 
EPR  pairs,  which  have  been  partially  corrupted  by  errors,  and 
outputs  nE  asymptotically  perfect  EPR  pairs.  E  is  the  entropy 
of  entanglement,  a  measure  of  the  number  of  quantum  errors 
which  the  pairs  suffered.  The  details  of  this  entanglement  pu¬ 
rification  procedure  are  beyond  the  scope  of  this  paper  but  the 
interested  reader  can  see  [5 1]— [53] . 

Fig.  10  depicts  a  purification  block.  The  quantum  inputs  to 
this  block  are  the  input  EPR  states  and  a  supply  of  |0)  qubits. 
The  outputs  are  pure  EPR  states.  Note  that  the  block  is  carefully 
designed  to  correct  only  up  to  a  certain  number  of  errors;  if 


more  errors  than  this  threshold  occur,  then  the  unit  fails  with 
increasing  probability. 

Fig.  8  illustrates  how  we  use  these  basic  building  blocks  and 
protocols  for  constructing  our  teleportation  channel.  The  EPR 
generator  is  placed  in  the  middle  of  the  wire  and  “pumps”  en¬ 
tangled  qubits  to  each  end  (via  a  pipelined  swapping  channel). 
These  qubits  are  then  purified  such  that  only  the  error-free 
qubits  remain.  Purification  and  teleportation  consume  zero-state 
qubits  that  are  supplied  by  the  entropy  exchange  unit.  Finally, 
the  coded-teleportation  unit  transmits  quantum  data  from  one 
end  of  the  wire  to  the  other  using  the  protocol  described  in 
Section  II-C.  Our  goal  now  is  to  analyze  this  architecture  and 
derive  its  bandwidth  and  latency  characteristics. 

D.  Analysis  of  the  Teleportation  Channel 

The  bandwidth  of  a  teleportation  channel  is  proportional  to 
the  speed  with  which  reliable  EPR  pairs  are  communicated. 
Since  we  are  communicating  unreliable  pairs,  we  must  purify 
them,  so  the  efficiency  of  the  purification  process  must  be  taken 
into  account.  Purification  has  an  efficiency  roughly  proportional 
to  the  fidelity  of  the  incoming,  unpurified  qubits  [49] 

purificationefficiency  «  fidelity2.  (8) 

Entropy  exchange  is  a  sufficiently  parallel  process  that  we  as¬ 
sume  enough  zero  qubits  can  always  be  supplied.  Therefore,  the 
overall  bandwidth  of  this  long  quantum  wire  is 

_  g  2X10  XG?qUbits  (9) 

1  pS 

which  for  a  l- pm  wire  is  999  967  qbps.  Note  that  this  result  is 
less  than  for  the  simple  wiring  scheme,  but  the  decoherence  in¬ 
troduced  on  the  logical  qubits  is  only  0(e~Xx  10).  It  is  this  latter 
number  that  does  not  change  with  wire  length  which  makes 
an  important  difference.  In  the  previous  short- wire  scheme  we 
could  not  make  a  wire  longer  than  6  pm.  Here  we  can  make  a 
wire  of  arbitrary  length.  For  example,  a  10-mm-long  wire  has  a 
bandwidth  of  716  531  qbps,  while  a  simple  wire  has  an  effective 
bandwidth  of  zero  at  this  length  (for  computational  purposes). 

The  situation  is  even  better  when  we  consider  latency.  Unlike 
the  simple  wire,  the  wire  architecture  we  propose  allows  for  the 
precommunication  of  EPR  pairs  at  the  sustainable  bandwidth  of 
the  wire.  These  precommunicated  EPR  pairs  can  then  be  used 
for  transmission  with  a  constant  latency.  This  latency  is  roughly 
the  time  it  takes  to  perform  teleportation,  or  «  20  ps.  Note 
that  this  latency  is  much  improved  compared  with  the  distance- 
dependent  simple  wiring  scheme. 

Using  the  same  constants  defined  above  for  the  swapping 
channel,  we  can  generalize  our  analysis  of  teleportation  chan¬ 
nels.  The  latency  is  simply 


4 


atency 


10  t 


swap 


(10) 


The  bandwidth  is 


bandwidthfrilP  = 


O  2AdqUbits 


(ID 


Unlike  the  short  wire,  this  bandwidth  is  not  constrained  by 
a  maximum  distance  related  to  the  Threshold  Theorem  since 
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teleportation  is  unaffected  by  distance.  The  communication  of 
EPR  pairs  before  teleportation,  however,  can  be  affected  by  dis¬ 
tance,  but  at  a  very  slow  rate.  While  purification  must  discard 
more  corrupted  EPR  pairs  as  distance  increases,  this  effect  is 
orders-of-magnitude  smaller  than  direct  data  transmission  over 
short  wires  and  is  not  a  factor  in  a  practical  silicon  chip  of  up  to 
tens  of  millimeters  on  a  side. 

V.  Fault-Tolerant  Architecture  and 
Geometric  Constraints 

We  turn  now  to  a  key  system  requirement  for  quantum  com¬ 
puting.  The  ability  to  tolerate  and  dynamically  handle  internal 
faults  while  preserving  the  integrity  of  the  computation.  Unlike 
present  classical  computing  systems,  where  the  gate  failure 
probability  is  extremely  low  (mosfets  in  CMOS  fail  with  prob¬ 
ability  lower  than  10“ 16  per  operation),  current  and  projected 
quantum  gates  have  O(10-1)  to  0(  10“ 7)  probabilities  of 
failure  per  operation, 

Nevertheless,  as  was  mentioned  in  the  introduction,  a  main 
result  in  the  field  is  that  by  using  a  construction  involving  fine¬ 
grained  fault  tolerance,  an  arbitrarily  reliable  quantum  informa¬ 
tion  processor  can  be  efficiently  constructed  using  unreliable 
components  [45]. 

In  this  section,  we  study  geometric  constraints  on  scalable, 
quantum  fault-tolerant  construction.  Key  to  our  study  of 
quantum  wires  was  a  tradeoff  between  the  geometric  design  of 
the  system  and  the  noise  generated  during  operation:  shrinking 
the  wires  exposes  quantum  effects  in  conductivity  and  voltage, 
and  lowers  the  fidelity  of  the  operations  performed  on  the  qubit. 
Allocating  more  space  allows  us  to  reduce  the  noise;  however, 
there  is  a  different  way  to  use  this  spatial  resource.  Instead  of 
making  larger  gates  or  wires,  space  can  alternatively  be  used  to 
perform  computations  using  a  fault-tolerant  quantum  circuit, 
employing  redundant,  faulty  quantum  gates.  These  two  strate¬ 
gies  for  achieving  reliable  computation,  either  at  the  cost  of 
larger  devices,  or  at  the  cost  of  more  area  for  redundant  circuits, 
present  different  tradeoffs  between  space  and  reliability. 

In  the  remainder  of  this  section,  we  present  an  explicit  analyt¬ 
ical  mathematical  formula  capturing  this  tradeoff,  and  demon¬ 
strate  some  global  geometric  bounds  on  fault-tolerant  quantum 
computation.  We  begin  in  Section  V-A  with  an  overview  of 
the  fault  tolerance,  which  is  then  described  in  detail  in  terms 
of  quantum  error  correction  (Section  V-B);  how  to  compute 
on  encoded  data  (Section  V-C);  and  how  to  do  so  recursively 
(Section  V-D).  We  then  introduce  our  reliability  model  in  Sec¬ 
tion  V-E,  and  describe  how  geometry  is  involved.  This,  then, 
leads  to  our  main  result  of  this  section,  in  Section  V-F. 

A.  Overview  of  Quantum  Fault-Tolerant  Strategy 

In  order  for  a  system  to  operate  reliably  despite  a  partial 
corruption  of  the  data  it  processes,  it  must  introduce  a  certain 
amount  of  redundancy  in  the  form  of  an  error-correction  code. 
This  protection  can  only  be  effective  if  the  redundancy  is  present 
at  all  times  in  the  computational  process.  All  operators  need 
to  be  consistently  modified  as  to  compute  directly  on  encoded 
data.  The  choice  of  a  code  is  dictated  by  three  criteria.  First, 


it  should  minimize  the  complexity  overhead  due  to  the  afore¬ 
mentioned  modification  of  the  circuit.  Second,  the  concentration 
of  redundancy  should  be  focused  around  strategic  operators, 
whose  erroneous  behavior  is  likely  to  occur  and  critical  for  the 
computation.  Third,  and  this  is  a  general  requirement  in  coding 
theory,  the  code  should  raise  a  syndrome  allowing  an  identifi¬ 
cation  or/and  correction  of  the  expected  errors.  The  freedom  in 
encoding  differs  in  a  quantum  and  in  a  classical  context.  The 
no-cloning  theorem  forbids  any  data  duplication  in  a  quantum 
system.  On  the  other  hand,  coding  and  decoding  schemes  might 
be  drastically  sped  up  by  the  use  of  quantum  resources  such  as 
entanglement. 

It  is  possible  to  develop  a  fault-tolerant  strategy  for  quantum 
systems  based  on  the  recursive  encoding  of  states  by  concate¬ 
nation  of  quantum  error-correction  codes  (see  Section  V-D, 
[2],  and  [54]).  The  main  result  we  build  upon  is  the  following: 
A  quantum  circuit  containing  N  error-free  gates  can  be 
simulated  with  a  probability  of  failure  of  at  most  e  using 
0(poly(\og(N/e))N)  imperfect  gates  which  fail  with  proba¬ 
bility  p  as  long  as  p  <  pt h,  where  pth  A  a  constant  threshold 
that  is  independent  of  N.  This  remarkable  result,  the  Threshold 
Theorem  [45],  is  achieved  by  three  steps:  1)  using  quantum 
error-correction  codes  (Section  V-B);  2)  performing  all  com¬ 
putations  on  encoded  data,  using  fault  tolerant  procedures 
(Section  V-C);  and  3)  recursively  encoding  until  the  desired 
reliability  is  obtained  (Section  V-D).  All  of  these  results  are 
from  prior  literature  [2],  [45],  [54]-[56],  but  we  describe  them 
here  to  make  our  contributions  clearer  in  later  sections. 

B.  Quantum  Error  Correction 

The  only  errors  which  can  occur  to  a  classical  bit  are  bit-flips 
and  erasures,  which  can  be  modeled  as  conditional  and  random 
NOT  gates.  Quantum  bits  suffer  more  kinds  of  error,  because  of 
the  greater  degree  of  freedom  in  their  state  representation;  sur¬ 
prisingly,  however,  there  are  general  strategies  for  reducing  the 
universe  of  possible  quantum  errors  to  only  two  kinds:  bit-flips 
(random  X  gates)  and  phase-flips  (random  Z  gates).  Classical 
error-correction  codes  only  take  into  account  bit  flip  errors  and, 
thus,  are  insufficient  for  correcting  quantum  data.  Furthermore, 
quantum  states  collapse  upon  measurement,  so  strategies  must 
be  employed  for  determining  errors  without  actually  measuring 
encoded  data. 

Classical  error  correction  relies  upon  distributing  k  bits 
of  information  across  n  bits  n  >  k  and  ensuring  enough 
redundancy  to  recreate  the  original  information.  Because  of  the 
no-cloning  theorem,  quantum  information  cannot  be  simply 
duplicated.  Instead,  redundancy  is  achieved  through  entangled 
states  with  known  properties.  For  example,  a  single  logical 
qubit,  co\Ol)  -h£i|l l)  can  be  represented  using  three  physical 
qubits,  as  the  state  Co  |000)  +  c\  |  111) .  A  bit  flip  error  on  the  first 
(left-most)  qubit  would  turn  this  into  co|100)  +  ci|011);  this 
error  can  be  detected  by  computing  the  parity  of  each  pair  of 
qubits,  and  leaving  the  result  in  an  extra  qubit  called  an  ancilla. 
The  three  parities  give  the  error  syndrome ,  uniquely  locating 
any  single  bit-flip  error.  Crucially,  this  strategy  reveals  nothing 
about  the  coefficients  Co  and  %,  since  the  parities  cannot 
distinguish  between  |000)  and  | 111)  or  any  single  bit- flip 
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Fig.  12.  Syndrome  measurement  for  a  3-qubit  code.  The  meter  boxes  indicate 
measurement,  and  the  double  lines  indicate  classical  communication  controlling 
the  application  of  the  Z  operator. 


TABLE  I 

Phase  Correction  for  A  3 -Qubit  Code 


JBfM 

ESI! 

Ejjjj 

mm 

|  no  error 

no  action 

u 

flip  qubit  3 

m 

mm 

qubit  1  flipped 

flip  qubit  1 

i 

m 

qubit  2  flipped 

flip  qubit  2 

version  of  the  two  three-qubit  strings.  By  measuring  parities, 
errors  can  be  detected  without  collapsing  encoded  data. 

Correcting  phase  flips  is  achieved  by  measuring  differences 
in  phase  by  using  a  circuit  like  the  one  in  Fig.  11.  This  works 
by  using  a  Hadamard  gate  to  transform  phase  flips  into  bit  flips. 
Parities  are  then  measured  as  before,  the  results  stored  in  ancilla 
qubits,  and  then  the  qubits  are  transformed  back  into  their  orig¬ 
inal  basis.  Fig.  12  shows  how  a  phase  error  syndrome  can  be 
computed  and  a  corresponding  correction  procedure  applied  to 
correct  the  error,  following  the  specification  of  Table  I. 

A  quantum  code  which  encodes  one  qubit  and  allows  any 
single  bit-flip  or  phase-flip  error  to  be  corrected  uses  the  en¬ 
coding  co|Ol)  +  ci|1l),  where  the  logical  zero  and  one  qubits 
are 


(|000)  +  |111»  ®  (|000)  +  |111))  ®  (|000)  +  |111)) 

|0l>  - 2V5 - 

„  .  (|ooo)-|m»®(|ooo)-|iii))®(|ooo)-|iii)) 

lu>  = - 271 - ■ 

This  nine  qubit  code,  discovered  by  Peter  Shor  [55],  is  also 
known  as  the  [9, 1, 3]  code,  in  the  notation  [n,  fc,  d],  where  n  is 
the  number  of  physical  qubits,  k  is  the  number  of  logical  qubits 
encoded,  and  d  is  the  quantum  Hamming  distance  of  the  code. 
A  code  with  distance  d  is  able  to  correct  (d  —  l)/2  errors. 


C.  Computing  on  Encoded  Data 

The  nine  qubit  code  has  a  remarkable  property  that  illustrates 
a  key  requirement  for  fault  tolerance:  applying  a  Z  gate  to  each 
of  the  nine  qubits  takes  |0l)  to  |lx,)  and  vice  versa.  It  is  the 
same  as  applying  a  logical  X  operator9  to  the  encoded  qubit! 
Similarly,  Z  can  be  performed  by  applying  an  X  operator  to 
each  qubit. 

In  this  paper,  we  employ  Steane’s  [7,1,3]  code  [57].  The 
[7, 1,3]  code  is  the  smallest  code  that  allows  direct  fault-tol¬ 
erant  application  of  nearly  all  the  operators  in  the  universal  set 
of  operators  discussed  in  Section  II-B,  namely  the  subset  { X , 
Z,  H,  CNOT } .  The  T  gate  can  also  be  performed  fault-tolerantly, 
using  a  slightly  more  involved  procedure.  Thus,  universal  com¬ 
putation  is  possible  without  requiring  that  the  data  be  decoded. 

Merely  computing  on  encoded  data  is  not  sufficient,  however; 
one  additional  step  is  required,  which  is  frequent  error  correc¬ 
tion.  Because  all  gates  used  in  this  task  are  assumed  to  be  sub¬ 
ject  to  failure,  this  must  be  done  in  a  careful  manner,  such  that 
no  single  gate  failure  can  lead  to  more  than  one  error  in  each 
encoded  qubit  block.  Such  constructions  are  known  as  fault  tol¬ 
erant  procedures ,  and  the  impact  of  this  requirement  on  our 
study  is  twofold:  1)  no  single  operation  may  cause  multiple  fail¬ 
ures  and  2)  measurement  errors  must  not  be  allowed  to  propa¬ 
gate  excessively.  To  achieve  1),  no  two  encoding  qubits  are  al¬ 
lowed  to  both  interact  directly  with  a  third  qubit.  Instead,  the 
“third”  qubit  is  replaced  with  a  cat  state  (a  generalization  of 
an  EPR  pair),  -^=  |00  . . .  0)  +  ^  1 11 ...  1),  that  has  itself  been 
verified.  Cat  states  are  used  because  they  do  not  transmit  errors 
through  CNOT  gates.  To  achieve  2),  measurements  are  performed 
in  a  multiple  fashion.  While  it  is  not  possible  to  copy  a  value  be¬ 
fore  measuring,  it  is  possible  to  form  a  three-qubit  state,  similar 
to  the  three-qubit  bit-flip  encoding  (Section  V-B),  where  all  of 
the  qubits  should  measure  to  the  same  value — if  one  of  the  mea¬ 
surements  differs,  it  is  assumed  to  be  in  error.  The  implications 
are  explored  in  detail  in  later  examples. 

Any  logical  operator  may  be  applied  as  a  fault  tolerant  proce¬ 
dure,  as  long  as  the  probability,  p ,  of  an  error  for  a  physical  oper¬ 
ator  is  below  a  certain  threshold,  1/c,  where  c  is  determined  by 
the  implementation  of  the  error-correction  code.  For  the  Steane 
[7, 1, 3]  code,  c  is  about  104.  The  overall  probability  of  error  for 
the  logical  operator  is  cp 2 .  That  is,  at  some  step  in  the  applica¬ 
tion  of  the  operator,  and  subsequent  error  correction,  two  errors 
would  have  to  occur  in  order  for  the  logical  operator  to  fail. 

D.  Recursive  Error  Correction 

A  very  simple  construction  allows  us  to  tolerate  additional 
errors.  If  a  logical  qubit  is  encoded  in  a  block  of  n  qubits,  it  is 
possible  to  encode  each  of  those  n  qubits  with  an  m-qubit  code 
to  produce  an  mn  encoding.  Such  recursion,  or  concatenation , 
of  codes  can  reduce  the  overall  probability  of  error  even  further. 
For  example,  concatenating  the  [7, 1, 3]  code  with  itself  gives 
a  [49, 1,  7]  code  with  an  overall  probability  of  error  of  c(cp 2)2 
(see  Fig.  13).  Concatenating  it  A;  —  1  times  gives  (cp)2  /c,  while 
the  size  of  the  circuit  increases  by  dk  and  the  time  complexity 
increases  by  tk ,  where  d  is  the  increase  in  circuit  complexity  for 
a  single  encoding,  and  t  is  the  increase  in  operation  time  for  a 

9The  overscore  denotes  an  operator  on  a  logical  qubit:  a  logical  operator. 
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Fig.  13.  Tree  structure  of  concatenated  codes. 


single  encoding.  For  a  circuit  of  size  p(n),  to  achieve  a  desired 
probability  of  success  of  1  —  s,k  must  be  chosen  such  that  [2] 


(cp)2 


< 


c  p(n)  * 

The  number  of  operators  required  to  achieve  this  result  is 


(12) 


0(p„ly(teM)  „(„)).  (13, 


Fig.  14.  Relation  between  circuit  reliability  and  area  required,  showing  the 
general  decreasing  trend  expected  for  p(  A) ,  and  achievable  configurations  using 
either  of  two  approaches,  requiring  area  Ae,  or  Aft,  to  achieve  p(  A)  <  e. 


E.  Reliability  Versus  Resources 

Given  recursive  codings,  we  can,  in  principle,  reduce  the 
probability  of  error  to  arbitrarily  low  levels.  Another  way  to 
view  this  is  that  there  is  a  close  relationship  between  the  spatial 
resources  used  by  a  gate  and  its  reliability.  The  first  part  of 
this  paper  pointed  out  that  (at  least  in  the  Kane  model),  an 
essential  limitation  comes  from  the  rise  of  quantum  effects  in 
the  wires  driving  the  fields  used  to  control  individual  qubits 
and  their  interactions.  As  the  dimensions  of  the  wires  shrink, 
their  current  becomes  quantized,  leading  to  an  imperfect  field 
profile  around  the  controlled  qubit.  This  reduces  the  fidelity  of 
the  quantum  gate  performed  on  the  qubit.  This  observation  is 
very  interesting,  and  general.  Fundamentally,  classical  control 
circuitry  becomes  unreliable  at  small  length  scales ,  but  the 
reliability  increases  with  area  used. 

This  failure  behavior  can  be  modeled  in  the  following  manner. 
Assuming  we  have  a  quantum  circuit  consuming  an  area  A  on  a 
layout,  we  may  let  p(A)  be  its  failure  probability.  The  argument 
above  justifies  the  assumption  that  p  is  a  decreasing  function 
of  A,  and  is  given  generically  by  a  graph  similar  to  Fig.  14. 
For  example,  it  is  likely  p  decreases  exponentially,  as  p(A)  ~ 
e~lA,  or  for  statistical  errors,  as  a  complimentary  error  function, 
p(A)  erfc(— 7A),  for  some  technology-dependent  parameter 
7- 

F.  Criteria  for  the  Efficiency  of  Fault  Tolerance 

Given  our  model  for  failure  probability  as  a  function  of  area, 
p(A),  and  the  resources  required  for  the  fault-tolerant  scheme 
using  recursive  encoding,  from  Section  V-D,  we  can  now 
analytically  express  the  tradeoff  between  the  area  required  to 
achieve  a  system  of  some  specified  reliability.  We  consider  two 
approaches.  The  first  is  simply  to  allocate  a  large  area,  such 
that  p(A)  is  as  small  as  desired.  The  alternative  is  to  apply  the 
fault-tolerant  construction  using  elementary  building  blocks 
with  a  small  area  Aq,  which  fails  with  higher  probability  p(Af), 
requiring  an  area  of  dk. 


Suppose  we  want  to  obtain  a  circuit  whose  failure  probability 
is  bounded  by  p  <  e.  The  first  approach  involves  using  a  large 
area,  A£  =  p~1(e).  The  second  approach  utilizes  a  recursive, 
fault  tolerant  construction,  which  makes  sense  if  the  component 
area  Aq  is  chosen  such  that  e  <  p(Af)  <  pth  =  1/c,  that 
is,  the  component  failure  probability  is  smaller  than  the  failure 
probability  threshold  tolerated  by  the  error-correction  code.  The 
overall  area  required  is  then  Aft  =  Aodk,  where  the  recursion 
level  k  is  determined  by  the  solution  to  (12), 


(14) 


The  fault  tolerant  construction  will  be  more  efficient  if  and  only 
if  Aft  <  A£  or,  equivalently,  if  there  exists  an  Aq  such  that 


P 


Aod 


log 


log(A) 

log(p  (Ao)c) 


>  £ 


t  <  p{Aq)  <  -. 

c 


(15) 

(16) 


This  is  an  interesting,  and  nonlinear  inequality;  solutions  may  be 
visualized  using  Fig.  14.  The  existence  of  an  area  efficient  fault- 
tolerant  implementation  depends  on  the  structure  of  the  function 
p{A).  If  p  decreases  slowly  enough  with  the  area  (as  inverse 
power  of  the  area  for  instance),  then  such  an  implementation 
exists. 

Fault  tolerance  through  recursive  encoding  can  drastically 
improve  the  reliability  of  quantum  circuits,  and  perhaps  even  in 
an  error  efficient  manner.  (16)  gives  a  method  for  determining 
the  appropriate  redundancy  in  the  design  of  a  quantum  circuit 
from  the  standpoint  of  area  efficiency.  The  possibilities  offered 
by  this  strategy  are  far  from  being  entirely  explored.  Some  so¬ 
lutions  to  this  equation  are  presented  elsewhere  [58],  but  other 
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approaches,  using  different  fault-tolerant  schemes  (such  as  non¬ 
recursive  constructions),  or  quantum  resource  assisted  fault  tol¬ 
erance  [59],  may  lead  to  modifications  of  this  bound.  In  gen¬ 
eral,  however,  the  concept  expressed  by  (16)  will  remain:  fine¬ 
grained  fault  tolerant  circuit  constructions  can  provide  valuable 
means  for  resource  efficiency  tradeoffs  in  future  quantum  archi¬ 
tectures. 

G.  Layout  of  Error- Correction  Circuits 

While  our  high-level  analysis  shows  that  recursive  error  cor¬ 
rection  has  desirable  efficiency  properties,  we  shall  see  that  the 
details  of  implementing  such  schemes  will  reveal  some  key  is¬ 
sues.  The  most  important  of  these  issues  is  the  need  for  reliable, 
long-distance  communication. 

Given  the  pitch-matching  constraints  of  linearity  with  infre¬ 
quent  junctions  from  IV-B-1,  there  are  still  several  ways  to  lay 
out  physical  and  logical  qubits.  Optimally,  qubits  should  be  ar¬ 
ranged  to  minimize  communication  overhead. 

In  a  fault  tolerant  design,  the  main  activity  of  a  quantum  com¬ 
puter  is  error  correction.  To  minimize  communication  costs, 
qubits  in  an  encoding  block  should  be  in  close  proximity.  As¬ 
suming  that  the  distance  between  junctions  is  greater  than  the 
number  of  qubits  in  the  block,  the  closest  the  qubits  can  be  is  in 
a  straight  line. 

A  concatenated  code  requires  a  slightly  different  layout.  Error 
correction  is  still  the  important  operation,  but  the  logical  qubits 
at  all  but  the  bottom  level  of  the  code  are  more  complicated.  For 
the  second  level,  the  qubits  are  themselves  simple  encodings, 
and  so  can  be  laid  out  linearly.  However,  we  want  these  qubits  in 
as  close  proximity  to  each  other  as  possible,  for  the  same  reasons 
we  wanted  the  qubits  in  the  simple  code  close.  Hence,  we  need 
to  arrange  the  bottom  level  as  branches  coming  off  of  a  main 
bus.  Similarly,  the  third  level  would  have  second-level  branches 
coming  off  of  a  main  trunk,  and  so  on  for  higher  levels. 

In  the  next  two  sections,  we  describe  a  basic  error-correction 
algorithm  and  its  recursive  application,  focusing  on  illustrating 
realistic  space  and  time  costs  such  as  those  described  above, 
imposed  by  2-D  implementation  technologies. 

VI.  Error-Correction  Algorithms 
A.  [7, 1, 3]  Code 

Error  correcting  using  the  [7, 1, 3]  code  consists  of  measuring 
the  error  syndrome  parities  of  the  encoding  qubits  in  various 
bases,  and  correcting  the  codeword  based  on  the  measured  syn¬ 
drome.  As  shown  in  Fig.  15,  the  qubits  are  rotated  to  the  different 
measurement  bases  with  Hadamard  gates.  Parity  is  then  mea¬ 
sured  in  much  the  same  way  as  with  a  classical  code,  using  two- 
qubit  CNOT  operators  acting  as  XORs.  Conceptually,  the  parity 
can  be  measured  in  the  same  way  as  the  three-qubit  code  in 
Section  V-B,  gathering  the  parity  on  ancilla  |0)s.  To  perform  a 
fault-tolerant  measurement,  however,  a  cat  state  is  used  in  place 
of  a  |0).  Fig.  15  shows  all  six  parity  measurements  using  cat 
states.  Not  shown  are  cat- state  creation  and  cat- state  verifica¬ 
tion. 

A  parity  measurement  consists  of  the  following  steps. 

1)  Prepare  a  cat  state  from  four  ancillae,  using  a  Hadamard 
gate  and  three  CNOT  gates. 


2)  Verify  the  cat  state  by  taking  the  parity  of  each  pair  of 
qubits.  If  any  pair  has  odd  parity,  return  to  step  1.  This 
requires  six  additional  ancillae,  one  for  each  pair. 

3)  Perform  a  CNOT  between  each  of  the  qubits  in  the  cat  state 
and  the  data  qubits  whose  parity  is  to  be  measured  (See 
Fig.  15). 

4)  Uncreate  the  cat  state  by  applying  the  same  operators  used 
to  create  it  in  reverse  order.  After  applying  the  Hadamard 
gate  to  the  final  qubit,  |  Af),  that  qubit  contains  the  parity. 

5)  Measure  | A0): 

A  With  \  Aq)  =  <r|0)  +  /3|  1),  create  the  three-qubit  state, 
ajOOO)  +  /3| 111)  by  using  \Aq)  as  the  control  for  two 
CNOT  gates,  and  two  fresh  |0)  ancillae  as  the  targets. 

B  Measure  each  of  the  three  qubits. 

6)  Use  the  majority  measured  value  as  the  parity  of  the  cat 
state. 

Each  parity  measurement  has  a  small  probability  of  introducing 
an  error,  either  in  the  measurement,  or  in  the  data  qubits.  Hence, 
the  entire  syndrome  measurement  must  be  repeated  until  two 
measurements  agree.  The  resulting  syndrome  determines 
which,  if  any,  qubit  has  an  error,  and  which  X,  Z,  or  Y  operator 
should  be  applied  to  correct  the  error.  After  correction,  the 
probability  of  an  error  in  the  encoded  data  is  0(p2). 

For  the  Steane  [7,1,3]  code,  each  parity  measurement 
requires  twelve  ancillae — four  for  the  cat  state  to  capture  the 
parity,  six  to  verify  the  cat  state,  and  two  additional  qubits  to 
measure  the  cat  state.  The  six  parity  measurements  are  each 
performed  at  least  twice,  for  a  minimum  of  144  ancillae  to 
measure  the  error  syndrome! 

The  minimum  number  of  operations  required  for  an  error 
correction  is  38  Hadamards,  288  CNOTs,  and  108  measure¬ 
ments.  With  parallelization,  the  time  required  for  the  operations 
is  24 S  +  156C  +  M,  where  S  is  the  time  required  for  a  single 
qubit  operator,  C  is  the  time  required  for  a  CNOT,  and  M  is  the 
time  required  for  a  measurement.  (We  assume  all  but  the  last 
measurement  are  performed  in  parallel  with  other  operations.) 

B.  Concatenated  Codes 

The  [7,1,3]  x  [7, 1, 3]  two-level  concatenated  code  is  mea¬ 
sured  in  the  same  way  as  the  [7, 1, 3]  code,  except  the  qubits 
are  encoded,  and  each  parity  measurement  uses  a  12-qubit  cat 
state.10 

10In  the  [7, 1, 3]  code,  an  A"  consists  of  an  A"  on  each  qubit.  The  parity  of 
the  logical  qubit  is  the  same  as  that  of  the  physical  qubits.  Since  a  logical  qubit 
is  a  valid  codeword,  a  four-qubit  subset  of  the  qubits  has  even  parity,  and  the 
remaining  three  qubits  has  the  same  parity  as  the  logical  qubit. 
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Fig.  16.  “Two-rail”  layout  for  the  three-qubit  phase-correction  code.  The  schematic  on  the  left  shows  qubit  placement  and  communication,  where  DiS  indicate 
data  qubits,  and  A^s  are  cat-state  ancillae.  The  column  of  D[ s  and  Ajs  form  a  swapping  channel  and  can  also  interact  with  the  data  and  cat-state  ancilla.  The  open 
qubit-swapping  channel  at  the  bottom  brings  in  fresh  ancillae,  and  removes  used  ancillae.  The  same  layout  is  shown  as  a  quantum  circuit  on  the  right,  with  the 
operations  required  to  create  and  verify  an  ancillary  cat  state,  and  to  measure  the  parity  of  a  pair  of  data  qubits. 


The  error  syndrome  measurement  is  analogous  to  the  singly- 
encoded  [7,1,3]  case,  except  that  the  lower-level  encodings 
must  be  error  corrected  between  the  following  operations. 

1)  Prepare  12  ancillae  in  a  cat  state. 

2)  Verify  the  cat  state  (66  ancillae  for  pairwise  verification.) 

3)  Perform  CNOTs  between  the  cat  state  qubits  and  the  qubits 
encoding  the  data  qubits  whose  parity  is  to  be  measured. 

4)  Error  correct  the  four  logical  data  qubits. 

5)  Uncreate  the  cat  state,  and  measure  the  resulting  qubit. 

As  in  the  singly-encoded  case,  each  syndrome  measurement 

must  be  repeated,  in  this  case  at  least  four  times.  The  resulting 
syndrome  determines  which,  if  any,  logical  qubit  has  an  error. 
The  appropriate  X,  Z,  or  Y  operator  can  be  applied  to  correct 
the  error.  After  the  correction  operator  is  applied  to  a  logical 
qubit,  that  qubit  must  be  error-corrected.  The  probability  of  an 
error  in  the  encoded  data  is  0(p4)  after  correction. 

Each  parity  measurement  requires  154  Hadamards,  1307 
CNOTs,  and  174  measurements,  in  time  2 6S  +  20167  +  M, 
using  the  same  assumptions  as  for  the  nonconcatenated  case. 

Of  course,  the  [7, 1, 3]  code  can  be  concatenated  more  than 
once.  The  error-correction  procedure  for  higher  levels  of  con¬ 
catenation  is  similar  to  the  above.  The  key  is  that  probability  of 
error  for  each  parity  measurement  must  be  0(p 2  ),  for  a  code 
concatenated  k  —  1  times. 


distance  and  in  terms  of  the  accumulating  probability  of  corre¬ 
lated  errors  between  redundant  qubits  in  our  codewords. 

A.  Error-Correction  Costs 

The  error-correction  algorithms  in  the  previous  section  are 
presented  for  the  ideal  situation,  where  any  qubit  can  interact 
with  any  other  qubit.  Usually,  qubits  can  only  interact  with  their 
near  neighbors,  so  before  applying  a  two-qubit  operator,  one  of 
the  operand  qubits  must  be  moved  adjacent  to  the  other. 

One  of  the  easiest  ways  to  move  quantum  data  is  to  use  the 
SWAP  operator.  By  applying  SWAPS  between  alternating  pairs  of 
qubits,  the  values  of  alternating  qubits  are  propagated  in  one 
direction,  while  the  remaining  qubit  values  are  propagated  in  the 
reverse  direction.  This  swapping  channel  can  be  used  to  supply 
|0)  ancillae  for  the  purpose  of  error  correction,  remove  “used” 
ancillae,  and  allow  for  qubit  movement.  Fig.  16  illustrates  this 
for  the  three-qubit  example,  using  two  columns  of  qubits,  one 
for  the  data  and  cat- state  qubits,  and  one  for  communication. 

The  same  layout  can  be  applied  to  the  [7, 1, 3]  code,  giving  a 
minimum  time  for  an  error-correction  parity  check  of 


Acc  —  12(£cc  +  tm  tp  +  ted,  H-  An.) 


(17) 


VII.  Communication  Costs  and  Error  Correction 

In  this  section,  we  model  the  communication  costs  of  the 
error-correction  algorithms  of  Section  VI,  under  the  constraint 
of  having  only  near  neighbor  interactions.  While  it  has  previ¬ 
ously  been  proven  that  under  such  constraints,  the  Threshold 
Theorem  can  still  be  made  to  apply  (given  suitably  reduced 
failure  probability  thresholds)  [60],  a  detailed  study  was  not  per¬ 
formed  with  layout  constraints  on  quantum  error-correction  cir¬ 
cuits.  We  first  study  the  growth  rate  of  errors  when  using  SWAP 
operations.  Second,  we  analyze  quantum  teleportation  as  an  al¬ 
ternative  to  SWAP  operations  for  long-distance  communication. 
Finally,  we  show  that  teleportation  is  preferable  both  in  terms  of 


where 

tec 

ten 


V 


time  for  cat- state  creation; 
time  for  cat-state  verification; 
time  to  entangle  the  cat  state  with  the  parity  qubits; 
tcu  time  to  uncreate  the  cat  state;  and 
tm  time  to  perform  a  triply -redundant  measurement. 

For  [7,1,3]  in  the  ideal,  parallel,  “sea-of-qubits”  model, 
tec  —  Aingle  T-  3Anot»  tcv  —  6Anot  fmeas?  tp  —  Anot ?  <Hld 
tcu  —  ^Anot  +  Aingie?  where 

Aingie  time  required  for  a  single-qubit  operator; 

Anot  time  required  for  a  CNOT  operator; 

Awap  time  required  for  a  SWAP  operator; 

Aeas  time  required  for  redundant  measurement. 
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Fig.  17.  Schematic  layout  of  the  H-tree  structure  of  a  concatenated  code.  The 
branches  labeled  | Z+)  are  for  logical  data  qubits,  and  consist  of  two  rails  of 
eleven  qubits  each — seven  qubits  for  data  and  four  for  ancillae.  The  branch 
labeled  |  Ax)  is  for  creating,  verifying,  and  uncreating  the  cat  state. 


If  communication  by  swapping  is  used 


tec  —  max(7singie,  tswap)  +  6fswap  +  3  max(7cnot,  fswap) 

(18) 

tcv  —  max(4ingle,  ^swap)  4~  9tswa p  +  11  max(tcnot ,  4wap) 

(19) 

tp  ^  7tSWap  +  4  max(tcnot,  fswap)  (20) 

ted  =  ^swap  4“  3fcnot  +  ^single  4“  ^meas*  (21) 


In  the  Kane  model,  t single  ^  4wap  ^  Anot  +  ^meas*  In¬ 
cluding  parallelism  between  parity  measurements,  the  minimum 
time  for  a  syndrome  measurement  is 


tecc  —  2217swap  +  210fcnot  +  ^single  +  ^meas* 


Since  measurement  is  fully  parallelizable,  these  times  assume 
that  there  are  enough  measurement  units  to  perform  measure¬ 
ment  in  parallel  with  the  other  operations  in  the  error-correction 
cycle. 


B.  Multilevel  Error  Correction 

For  the  singly  concatenated  code,  the  data  movement  in  the 
upper  level  is  more  complicated,  although  (17)  still  holds.  The 
first  step  in  the  error  correction  is  creating  and  verifying  the 
12-qubit  cat  state.  Fig.  17  shows  how  the  ancillae  “branches” 
are  incorporated  into  the  data  branches.  After  verification,  the 
cat  state  is  moved  to  the  appropriate  data  branches,  where  it  is 
CNOTed  with  the  data  qubits.  The  cat  state  is  then  moved  back 
and  uncreated,  while  the  data  branches  are  error-corrected.  Fi¬ 
nally,  a  Hadamard  is  applied  to  the  last  cat-state  ancilla,  which 
is  then  redundantly  measured.  The  layout  in  Fig.  17  is  not  nec¬ 
essarily  optimal. 

For  [7, 1, 3]  concatenated  with  itself  k  times 


tcc,k  ~  rio§2(^+)l  Anot  4“  ^  2 

■3). 

0  1  ''swap 

(22) 

tcv,k  — ZUktcnot  4“  {p’kip’k  2)  +  2)  Awap 

(23) 

tp:k  —  4”  3A,/c  4“  3 4+— 1  4“  Acc. 

,k- 1 

(24) 

tcu,k  —  Ac,fc  4“  Aingle  4“ 

(25) 

ak=  4  x  3fe_1 

(26) 

k  =  1 

4+  =  < 

\  B , 

I  4+-  1  +  (n  +  Ul)tb,k-2, 

k  =  2 
k  =  3 

(27) 

l  4,fc- 1  +  2  |"f  ]  4,fc- 2, 

k>  3 

where  the  subscript  k  indicates  the  level  of  encoding,  a k  is  the 
number  of  qubits  in  the  cat  state  at  level  k ,  4+  is  the  branch 
distance  between  logical  qubits  at  level  k,  B  is  the  minimum 
number  of  qubits  between  two  branches  for  a  given  architec¬ 
tural  model,  and  n  is  the  number  of  physical  qubits  in  the  non- 
concatenated  code. 

With  communication  by  swapping  channel,  the  SWAP  oper¬ 
ator  becomes  very  important.  In  the  sea-of-qubits  model,  SWAPS 
are  not  required.  In  the  model  described  above,  SWAPS  account 
for  over  80%  of  all  operations. 


C.  Avoiding  Correlated  Errors 


An  important  assumption  in  quantum  error  correction  is  that 
errors  in  the  redundant  qubits  of  a  codeword  are  uncorrelated. 
That  is,  we  do  not  want  one  error  in  a  codeword  to  make  a  second 
error  more  likely.  To  avoid  such  correlation,  it  is  important  to  try 
not  to  interact  qubits  in  a  codeword  with  each  other. 

Unfortunately,  we  find  that  a  2-D  layout  cannot  avoid  indirect 
interaction  of  qubits  in  a  codeword.  At  some  point,  all  the  qubits 
in  a  codeword  must  be  brought  to  the  same  physical  location  in 
order  to  calculate  error  syndromes.  In  order  to  do  this,  they  must 
pass  through  the  same  line  of  physical  locations.  Although  we 
can  avoid  swapping  the  codeword  qubits  with  each  other,  we 
cannot  avoid  swapping  them  with  some  of  the  same  qubits  that 
flow  in  the  other  direction. 

For  concreteness,  if  two  qubits  of  codeword  do  and  di  both 
swap  with  an  ancilla  ao  going  in  the  opposite  direction,  there 
is  some  probability  that  do  and  d\  will  become  correlated  with 
each  other  through  the  ancilla.  This  occurs  if  both  SWAPS  ex¬ 
perience  a  partial  failure.  In  general,  if  p  is  the  probability  of  a 
failure  of  a  SWAP  gate,  the  probability  of  an  error  from  swapping 
a  logical  qubit  is 


nkbkp 


bkP 3  H - 


where  bk  is  the  number  of  qubits  between  branches  at  level  k , 
and  the  higher  order  terms  are  due  to  correlation  between  the 
qubits.  From  this  form,  it  is  clear  that  correlated  errors  are  dom¬ 
inated  by  uncorrelated  errors,  when  nkp  1. 

By  calculating  the  number  of  basic  computation  and  commu¬ 
nication  operations  necessary  to  use  teleportation  for  long-dis¬ 
tance  communication,  we  can  quantify  when  we  should  switch 
from  swapping  to  teleportation  in  our  tree  structure.  Fig.  18  il¬ 
lustrates  this  tradeoff.  We  can  see  that  for  B  —  22,  teleportation 
should  be  used  when  k  >  5. 


D.  Teleportation 

Table  II  lists  the  number  of  SWAP  operations  required  to  move 
an  unencoded  qubit  from  one  level- k  code  word  to  the  adjacent 
code  word  for  different  minimum  branch  distances,  as  well  as 
the  total  operations  to  teleport  the  same  qubit.  Since  a  teleporta¬ 
tion  channel  precommunicates  EPR  pairs,  it  has  a  fixed  cost.  To 
use  teleportation  for  our  circuit,  we  must  evaluate  the  number 
of  computation  and  communication  operations  within  the  tele¬ 
portation  circuit.  By  comparing  this  number  of  operations  with 
the  swapping  costs  from  the  previous  section,  we  can  decide  at 
what  level  k  of  the  tree  to  start  using  teleportation  instead  of 
swapping  for  communication. 
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Swapping  and  Teleportation 
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Fig.  18.  Cost  of  teleportation  compared  to  swapping.  The  B -values  chosen 
illustrate  break-even  points  for  different  levels  of  recursion. 


TABLE  II 

Comparison  of  the  Cost  of  Swapping  an  Encoded  Qubit  to  the 
Cost  of  Teleporting  it.  The  B -Values  are  the  Distance 
Between  Adjacent  Qubits 


k 

Teleportation 

Swapping, 

5  =  22 

Swapping, 

5  =  61 

Swapping, 
5  =  285 

1 

864 

1 

1 

1 

2 

864 

22 

61 

285 

3 

864 

77 

194 

866 

4 

864 

363 

948 

4,308 

5 

864 

1,199 

3,032 

13,560 

6 

864 

4,543 

11,680 

52,672 

Teleportation  has  another  advantage,  which  is  beyond  the 
scope  of  this  study.  By  suitably  modifying  the  EPR  pairs, 
teleportation  can  be  used  to  perform  operations  at  a  distance 
[59].  It  does  not  eliminate  the  need  for  error  correction,  and 
correctly  modifying  the  EPR  pairs  has  its  own  costs.  This  is  an 
interesting  area  for  future  research. 

VIII.  System  Bandwidth 

Our  goal  has  been  to  design  a  reliable,  scalable  quantum  com¬ 
munication  layer  that  will  support  higher-level  quantum  error 
correction  and  algorithms  functioning  on  top  of  this  layer.  A  key 
issue  for  future  evaluation,  however,  is  that  the  lower  latency  of 
our  teleportation  channel  actually  translates  to  an  even  higher 
bandwidth  when  the  upper  layers  of  a  quantum  computation  are 
considered.  It  is  for  this  reason  that  long  wires  should  not  be 
constructed  from  chained  swapping-channels  and  quantum  “re¬ 
peaters.” 

The  intuition  behind  this  phenomenon  is  as  follows.  Quantum 
computations  are  less  reliable  than  any  computation  technology 
that  we  are  accustomed  to.  In  fact,  quantum  error  correction  con¬ 
sumes  an  enormous  amount  of  overhead  both  in  terms  of  redun¬ 
dant  qubits  and  time  spent  correcting  errors.  This  overhead  is 
so  large  that  the  reliability  of  a  computation  must  be  tailored 
specifically  to  the  run  length  of  an  algorithm.  The  key  is  that, 
the  longer  a  computation  runs,  the  stronger  the  error  correction 
needed  to  allow  the  data  to  survive  to  the  end  of  the  compu¬ 
tation.  The  stronger  the  error  correction,  the  more  bandwidth 


consumed  transporting  redundant  qubits.  Thus,  lower  latency 
on  each  quantum  wire  translates  directly  into  greater  effective 
bandwidth  of  logical  quantum  bits. 

IX.  Conclusion 

Quantum  computation  is  in  its  infancy,  but  now  is  the  time 
to  evaluate  quantum  algorithms  under  realistic  constraints  and 
derive  the  architectural  mechanisms  and  reliability  targets  that 
we  will  need  to  scale  quantum  computation  to  its  full  potential. 
Our  work  has  focused  upon  the  spatial  and  temporal  constraints 
of  solid-state  technologies. 

Building  upon  key  pieces  of  quantum  technology,  we  have 
provided  an  end-to-end  look  at  a  quantum  wire  architecture.  We 
have  exploited  quantum  teleportation  to  enable  pipelining  and 
flexible  error  correction.  We  have  shown  that  our  teleportation 
channel  scales  with  distance  and  that  swapping  channels  do  not. 
Finally,  we  have  discovered  fundamental  architectural  pressures 
not  previously  considered.  These  pressures  arise  from  the  need 
to  colocate  physical  phenomena  at  both  the  quantum  and  clas¬ 
sical  scale.  Our  analysis  indicates  that  these  pressures  will  force 
architectures  to  be  sparsely  connected,  resulting  in  coarser-grain 
computational  components  than  generally  assumed  by  previous 
quantum  computing  studies. 

At  the  systems  level,  the  behavior  of  wires  becomes  a  cru¬ 
cial  limiting  factor  in  the  ability  to  construct  a  reliable  quantum 
computer  from  faulty  parts.  While  the  Threshold  Theorem  al¬ 
lows  fault-tolerant  quantum  computers  to  be  realized  in  prin¬ 
ciple,  we  showed  that  in  practice  many  assumptions  must  be 
carefully  scrutinized,  particularly  for  implementation  technolo¬ 
gies  that  force  a  2-D  layout  scheme  for  qubits  and  their  inter¬ 
connects.  Our  analysis  suggests  that,  rather  counterintuitively, 
fault-tolerant  constructions  can  be  more  resource  efficient  than 
equivalent  circuits  made  from  more  reliable  components,  when 
the  failure  probability  is  a  function  of  resources  required.  And 
a  detailed  study  of  the  resources  required  to  implement  recur¬ 
sive  quantum  error-correction  circuits  highlights  the  crucial  role 
of  qubit  communication,  and  in  particular,  the  dominant  role  of 
SWAP  gates.  We  find  that  at  a  certain  level  of  recursion,  resources 
are  minimized  by  choosing  a  teleportation  channel  instead  of  the 
SWAP.  It  is  likely  that  the  reliability  of  the  quantum  SWAP  oper¬ 
ator  used  in  short-distance  communication  will  be  the  dominant 
factor  in  future  quantum  architecture  system  reliability. 
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Determining  classically  whether  a  coin  is  fair  (head  on  one  side, 
tail  on  the  other)  or  fake  (heads  or  tails  on  both  sides)  requires  an 
examination  of  each  side.  However,  the  analogous  quantum 
procedure  (the  Deutsch-Jozsa  algorithm1,2)  requires  just  one 
examination  step.  The  Deutsch-Jozsa  algorithm  has  been  real¬ 
ized  experimentally  using  bulk  nuclear  magnetic  resonance 
techniques3,4,  employing  nuclear  spins  as  quantum  bits  (qubits). 
In  contrast,  the  ion  trap  processor  utilises5  motional  and  elec¬ 
tronic  quantum  states  of  individual  atoms  as  qubits,  and  in 
principle  is  easier  to  scale  to  many  qubits.  Experimental  advances 
in  the  latter  area  include  the  realization  of  a  two-qubit  quantum 
gate6,  the  entanglement  of  four  ions7,  quantum  state  engineering8 
and  entanglement-enhanced  phase  estimation9.  Here  we  exploit 
techniques10,11  developed  for  nuclear  magnetic  resonance  to 
implement  the  Deutsch-Jozsa  algorithm  on  an  ion-trap  quantum 
processor,  using  as  qubits  the  electronic  and  motional  states  of  a 
single  calcium  ion.  Our  ion-based  implementation  of  a  full 
quantum  algorithm  serves  to  demonstrate  experimental  pro¬ 
cedures  with  the  quality  and  precision  required  for  complex 
computations,  confirming  the  potential  of  trapped  ions  for 
quantum  computation. 

Laser-cooled  trapped  ions  are  ideally  suited  to  the  investigation 
and  implementation  of  quantum  information  processing12  because 
they  exhibit  these  properties:  (1)  localization  of  the  single  particle  to 
less  than  a  few  tens  of  nanometres13-15;  (2)  control  of  the  motional 
state  down  to  the  zero  point  of  the  trapping  potential8,16;  (3)  a  high 
degree  of  isolation  from  the  environment  and  thus  a  very  long  time 
available  for  manipulations  of  their  quantum  state17;  and  (4)  the 
ability  to  detect  the  ions  quantum  state  with  high  precision  by  the 
electron  shelving  technique18.  The  same  properties  make  single 
trapped  ions  well  suited  for  storing  quantum  information  in 
long-lived  internal  states19. 

In  our  experiment  we  implement  the  Deutsch-Jozsa  algorithm 
on  a  quantum  processor  based  on  a  single  trapped  40Ca+  ion  which 
is  driven  by  laser  pulses.  A  compensation  technique  for  frequency 
shifts  allows  us  to  achieve  the  required  control  over  the  optical 
phases  of  the  pulses20.  Following  a  recent  proposal10,  we  also 
successfully  combine  ion-trap  techniques  for  quantum  state 


Table  1  Truth  table  for  the  four  possible  functions 


Constant  functions 

Balanced  functions 

Case  1 

Case  2 

Case  3 

Case  4 

m 

0 

1 

0 

1 

f(  i) 

0 

1 

1 

0 

w(Bf(a) 

ID 

NOT 

CNOT 

Z-CNOT 

The  third  line  is  the  effect  of  the  logic  function  Ufn  on  the  qubit  w\  I D  denotes  the  identity,  CNOT  is  a 
controlled  NOT  operation,  Z-CNOT  is  a  zero  controlled  NOT,  and  the  control  bit  in  cases  3  and  4 
is  the  input  bit  a. 


manipulation  with  the  method  of  composite  pulses11  adopted 
from  NMR  technology.  Thus  we  achieve  complete  control  over 
the  ions  motional  and  electronic  state.  The  implementation  of  a 
quantum  algorithm  on  an  ion-trap  processor,  which  we  demon¬ 
strate  here,  serves  as  a  test  of  the  suitability  of  these  techniques, 
particularly  in  view  of  their  scalability  towards  a  larger  number  of 
qubits. 

To  illustrate  the  Deutsch-Jozsa  algorithm,  we  represent  the  four 
possible  coins  by  four  functions /that  map  one  input  bit  {a  =  0,1 
standing  for  ‘which  side  of  the  coin’)  onto  one  output  bit 
( f(a )  =  0,1  standing  for  ‘head  or  tail’).  These  functions  can  be 
divided  into  two  constant  functions /(a)  =  0,/2(a)  =  1,  represent¬ 
ing  the  fake  coins,  and  two  balanced  functions  /3(a)  =  a,  f \{a)  = 
NOT  a,  which  stand  for  the  fair  coins  (see  Table  1).  An  unknown 
function  is  characterized  as  constant  or  balanced  by  evaluating 
/(0)®/(l)  which  yields  0  (or  1)  for  a  constant  (or  balanced) 
function  (®  denotes  addition  modulo  2).  This  evaluation  classically 
requires  two  function  calls,  whereas  the  Deutsch-Jozsa  quantum 
algorithm  allows  us  to  obtain  the  desired  information  with  a  single 
evaluation  of  the  unknown  /.  The  circuit  diagram  shown  in  Fig.  1 
describes  the  implementation  of  the  Deutsch-Jozsa  algorithm  with 
basic  quantum  operations21.  The  two  qubits  required  for  the 
Deutsch-Jozsa  algorithm  are  encoded  in  the  electronic  state  and 
in  the  phonon  (vibrational  quantum)  number  of  the  axial  vibration 
mode  of  the  single  trapped  ion  (see  Fig.  2).  Qubit  operations  are 
realized  by  applying  laser  pulses  on  the  ‘carrier’  or  the  ‘blue  side¬ 
band’  of  the  electronic  quadrupole  transition  as  described  in  the 
Methods. 

In  general,  a  quantum  algorithm  is  implemented  by  a  sequence  of 
such  pulses  on  the  carrier  and  sideband,  but  two  major  sources  of 
error  have  to  be  overcome.  First,  as  the  simplest  algorithms  already 
require  several  pulses,  we  need  to  control  precisely  the  relative 
optical  phases  of  these  pulses  or,  at  least,  to  keep  track  of  them  such 
that  the  required  pulse  sequences  lead  to  the  desired  operations.  In 
particular,  this  requires  the  precise  investigation  and  subsequent 
compensation  of  all  phases  introduced  by  the  light  shifts  of  the 
exciting  laser  beams.  These  light  shifts  arise  as  we  have  to  drive 


la,w>0  la,w>.,  la,w>2  la,i/i/>3 


Figure  1  Quantum  circuit  for  implementing  the  Deutsch-Jozsa  algorithm  with  basic 
quantum  operations.  The  upper  line  shows  the  input  qubit  | a)  (‘which  side  of  the  coin’ 
information),  the  lower  line  an  auxiliary  working  qubit  \  w)  (corresponding  to  the  channel 
on  which  the  answer  is  provided).  The  rotations  Ry  (see  Methods  for  details)  create 
superpositions  la)-,  =  (|0)  +  |1»/V2  and  \  w\  =  (|0)  -  |1»/V2  from  the  inputs 
|a)0  =  |0>  and  |  n/)0  =  |1>.  The  box  Ufn  represents  a  unitary  operation  specific  to  each  of 
the  functions  f  n,  which  applies  fn  to  a  and  adds  the  result  to  w  modulo  2.  Table  1  lists 
the  logic  operations  required  for  transforming  |  w)  into  |  w(Bfn(a)).  The  output  of  the  box 
is  | a,  w) 2  =  (|0,  i/i/jr,® A(0)>  + 11 ,  Win®f/j(1)))/V2.  Up  to  an  overall  sign  \  w)  is  left 
unchanged,  but  the  positive  superposition  (|0)  +  |1))/V2  on  | a)  is  transformed  into  a 
negative  superposition  \a)2  =  (|0)  -  |1»/V2  if  f  is  balanced;  otherwise  it  is 
unchanged.  After  the  final  rotations  Ry,  a  measurement  on  | a)  is  performed  with 
result  |a)3  =  either  |0)  or  |1>.  Because  of  the  sign  change  in  |a)2  if  f  is  balanced, 

|(1  |  a)3|2  =  ^(0)04(1),  that  is,  |a)3  yields  the  desired  information  whether  the 
function  fn  is  balanced  or  constant.  The  working  qubit  w  resumes  its  initial  value 
l^3  =  k>o  =  M>- 
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sideband  transitions  (which  couple  much  more  weakly  than  carrier 
transitions)  with  high  laser  intensity.  We  cancel  the  unwanted  light 
shifts  with  an  additional  off-resonant  laser  field,  inducing  a  light 
shift  of  equal  strength  but  opposite  sign20. 

Second,  a  peculiarity  of  encoding  a  qubit  within  the  ions 
motional  state  is  that  we  must  ensure  that  the  system  does  not 
leave  the  computational  subspace  ||S,0Z),  |D,0Z),  |S,  lz),  |D,  lz)|} 
(for  notation,  see  Methods).  The  main  problem  here  is  that  owing  to 
the  degenerate  spectrum  of  a  harmonic  oscillator,  sideband  pulses 
work  simultaneously  on  all  levels.  Therefore  any  population  in 
| S,  lz)  prior  to  a  blue  sideband  pulse  will  leave  the  computational 
subspace.  To  avoid  this,  we  use  composite  pulses,  that  is,  a  sequence 
of  carrier  and/or  sideband  pulses  that— up  to  an  overall  phase— 
constrain  the  system  to  the  subspace10.  We  adopted  this  method 
from  NMR  technology11.  The  translation  of  the  Deutsch — Jozsa 
algorithm  into  composite  pulses  acting  on  the  two  qubits  is 
described  in  the  Methods. 

For  our  experiments  we  load  Ca  ions  into  a  linear  Paul  trap  with 
axial  frequency  coz  ~  2tt  X  1.7  MHz.  Figure  2  shows  the  relevant 
optical  transitions  used  for  laser  cooling,  state  preparation  and 
detection.  Each  experimental  cycle  starts  with  Doppler  cooling  for 
2  ms  on  the  S1/2— >Pi/2  transition  yielding  average  vibrational 
quantum  numbers  nz  ~  20.  Further  cooling  of  the  axial  motion 
to  a  ground  state  occupation  of  more  than  99%  is  achieved  by  about 
12  ms  of  sideband  cooling8.  To  initalize  the  quantum  processor  in 
|01)=  |S,0Z),  we  optically  pump  the  ion  to  the  S1/2  (ra  =  —1/2) 
state.  Manipulations  of  both  qubits  are  achieved  by  pulses  from  a 
stabilized  titanium-sapphire  laser  (linewidth  <  100  Hz,  relative 
intensity  noise  <  0.02r>m>s>)  emitting  at  the  Sy2^D5/2  transition 
wavelength  near  729  nm.  In  order  to  switch  between  R  and  R+ 
rotations  we  shift  the  laser  frequency  with  an  acousto- optical 
modulator.  The  phase  of  the  light  field  is  switched  via  the  phase 
of  the  radio  frequency  driving  the  acousto-optical  modulator  with 
an  inaccuracy  of  less  than  0.06  rad.  Using  the  electron  shelving 
technique8  we  detect  the  ions  electronic  state  (S1/2  or  D5/2)  with  a 
fidelity  of  99.9%  within  a  detection  time  of  3  ms. 

We  measure  the  fidelity  of  the  implemented  algorithm  by 
repeating  several  thousand  times  the  experimental  sequence  of 
cooling,  initialization  of  both  qubits,  laser  pulses  for  the  algorithm 
and  final  measurement.  Table  2  displays  the  achieved  results.  For 
cases  1,  3  and  4,  the  fidelity  of  identifying  the  functions  class  with  a 
single  measurement  exceeds  97%;  for  case  2,  it  is  above  90%.  Note 


Figure  2  Quantum  mechanical  energy  levels  relevant  for  the  ion-trap  quantum  computer, 
a,  Ca+  level  scheme.  The  upper  and  lower  electronic  states  SV2  ( m  =  -1/2)  and  D5/2 
[m  =  -1/2)  of  the  narrow  quadrupole  transition  (rD  ~  1  s)  at  729  nm  serve  to 
implement  one  of  the  qubits,  | a).  Coherent  radiation  of  a  titanium-sapphire  laser  at 
729  nm  drives  the  qubit  transition.  Lasers  at  397  nm,  866  nm  and  854  nm  are  used  for 
the  excitation  of  resonance  fluorescence,  for  Doppler  cooling,  and  optical  pumping.  The 
laser  system  is  described  in  detail  elsewhere19,  b,  The  lowest  two  number  states, 
nz  —  0Z,1Z,  of  the  axial  vibrational  motion  in  the  trap  form  the  other  qubit,  |  w).  c,  The 
combination  of  electronic  states  and  energy  eigenstates  of  the  harmonic  oscillator 
potential  span  the  computational  subspace.  Numbers  in  ket  notation  denote  the  quantum 
logical  values  assigned  to  the  respective  states.  Solid  lines  show  carrier  transitions; 
dashed  lines  show  blue  sideband  transitions. 


Table  2  Expected  and  measured  results  of  the  complete  Deutsch-Jozsa  algorithm 

Constant  Balanced 


Case  1 

Case  2 

Case  3 

Case  4 

Expected  |(1  |  a)|2 

0 

0 

1 

1 

Measured  |(1  |  a)|2 

0.019(6) 

0.087(6) 

0.975(4) 

0.975(2) 

Expected  |(1  \w)\2 

1 

1 

1 

1 

Measured  |(1  |  w)\2 

- 

0.90(1) 

0.931(9) 

0.986(4) 

The  numbers  in  brackets  are  statistical  la  uncertainties. 


that  to  decide  whether  the  function  is  constant  or  balanced,  only 
|(1  |  d)3 1 2  at  the  end  of  the  algorithm  needs  to  be  measured.  We  also 
verified  that  the  working  qubit  \w)  is  reset  to  its  initial  value  by 
reading  out  the  phonon  number  through  a  measurement  of  the 
Rabi  frequency  of  the  blue  sideband  transition8,16. 

The  measured  output  of  the  algorithm  shown  in  Table  2  slightly 
deviates  from  the  ideal  result.  We  identified  the  major  sources  for 
this  infidelity  and  attribute  it  mainly  to  decoherence  of  the  laser- 
atom  phase,  in  particular  caused  by  ambient  magnetic  field  fluctu¬ 
ations22.  Furthermore,  in  the  implementation  of  case  2,  which 
requires  the  most  complex  pulse  sequence,  we  used  higher  laser 
power  of  the  sideband  transitions  in  order  to  speed  up  the  algorithm 
and  thus  reduce  the  sensitivity  to  phase  decoherence.  This  in  turn 
caused  off-resonant  carrier  excitation  which  limited  the  obtainable 
fidelity. 

A  major  advantage  of  our  state  detection  technique  is  the  ability 
to  follow  the  evolution  of  |(1  |  a) \2  during  the  quantum  algorithm. 
For  this,  we  truncate  the  pulse  sequence  at  a  certain  time  t  and  reveal 
|(1  |  a(t))\2  by  measuring  the  probability  of  finding  the  ion  in  the 
D  5/2  state.  In  Fig.  3  we  display  this  probability  as  a  function  of  time 
for  all  four  cases.  The  data  agree  very  well  with  the  calculated  ideal 


Figure  3 Time  evolution  of  |(1  |  a) |2.  Points  are  the  probabilities,  each  inferred  from 
100  measurements,  the  line  shows  the  ideal  evolution.  No  parameters  were  adjusted 
to  fit  the  data.  The  implementation  of  the  functions  takes  place  between 

the  dashed  lines.  An  initial  RYa  and  a  final  RYa  rotation  on  | a),  implemented  by  carrier 
pulses,  complete  the  algorithm.  Taking  case  3  as  an  the  example,  RYa  lasts  from 
1 2  pus  to  22  pus.  Then  RywUfnRYw  on  | a,W)  is  implemented  from  54  [is  to  212fxs 
with  the  laser  tuned  to  the  blue  sideband.  The  laser  phase  is  switched  at  87,  133 
and  166fxs  according  to  Table  3.  The  final  R-Ya  pulse  is  applied  from  240  to 
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Table  3  Implementations  of  Ryw 

UfnRyw 

Logic 

Laser  pulses 

U 

RywRyw 

No  pulses 

R-yw  SWAP”1 11  NOTa  SWAP  R 

*  r+(a°)r+(5§’^)r+(A°) 

R(l,0)R(x,|)R(!,x) 

R+(^’1r)Rf(^l  .■'  +  ^swap)R+(^,x) 

^3 

Ryw  CNOT  Ryw 

Rf(*>°)Rf(1''l)Rf(*>°)RfCl) 

U 

Ryw  Z-CNOT  RYw 

R(ir ,  0)R+  ,  0)  R+  (* ,  f)R+  ,  0)  R+  (it  ,  f)R(ir ,  0) 

The  rotation  angle  for  P+(0,<p)  is  given  for  the  |1 0) — ►  |01)  transition.  0  and  <p  denote  the  pulse 
duration  and  phase,  respectively.  <pswap  =  arccos(cot2(-ir/V2)),  where  the  SWAP  operation  is 
explained  in  the  Methods. 


evolution  (solid  lines  in  Fig.  3,  no  fit  parameters),  demonstrating 
the  high  precision  of  the  applied  pulse  sequence,  especially  the 
control  over  the  optical  phases. 

The  results  demonstrate  a  high  degree  of  control  of  all  relevant 
experimental  parameters,  that  is,  laser  frequency  and  intensity, 
optical  phases,  and  trap  frequency  coz,  over  long  pulse  sequences. 
Therefore,  the  procedures  presented  here  pave  the  way  for  imple¬ 
menting  more  complex  algorithms  and  for  scaling  the  system  to 
multi-qubit  operation.  In  particular,  the  light  shift  compensation 
technique  demonstrated  in  this  experiment  can  be  directly  trans¬ 
ferred  and  advantageously  applied  to  a  several-qubit  quantum 
processor.  This  technique  will  become  increasingly  important  for 
scaling  such  a  system  because  as  the  ion  crystal  becomes  heavier, 
the  higher  laser  intensities  required  to  drive  sideband  transitions 
result  in  increased  light  shifts.  Furthermore,  by  merging  the 
composite  pulse  technique  with  our  trapped-ion  quantum  com¬ 
puter  we  gain  full  access  to  all  gate  operations  on  the  motional 
qubit.  The  employed  composite-pulse  phase  gate  also  simplifies 
the  Cirac-Zoller  scheme5  for  a  universal  set  of  quantum  gates,  by 
dispensing  with  the  auxiliary  level  transition.  Thus  our  procedures 
become  applicable  to  a  wider  choice  of  ion  species  including 
43Ca+,  which  offers  a  potentially  much  longer  coherence  time 
than  40Ca+.  □ 

Methods 

Encoding  of  qubits  and  single-qubit  rotations 

The  two  qubits  required  for  the  Deutsch-Jozsa  algorithm  are  encoded  in  the  electronic 
quantum  state  (S1/2  (m  =  -1/2)  =  |0)  =  |S)  and  D5/2  (m  =  — 1/2)  =  |l)  =  | D))  and  in 
the  phonon  number  of  the  axial  vibration  mode  of  the  single  trapped  ion  (nz  =  0Z  =  |l) 
and  nz  =  lz  =  |0).  Note  the  counterintuitive  encoding  of  the  vibrational  mode,  which 
simplifies  the  desired  initial  state  preparation  in  |01)  =  |S,0Z).  The  operations  which 
modify  the  electronic  qubit  (‘single-qubit  rotations’)  are  performed  with  laser  pulses  on 
the  carrier  (|S,  nz)  <->■  | D,  nz))  transition,  that  is,  no  change  of  vibrational  quantum 
number,  laser  on  resonance.  To  connect  the  two  qubits  (‘two-qubit  rotations’)  the  laser  is 
detuned  by  +coz  from  the  |S)  <-*■  | D)  resonance  to  the  ‘blue  sideband’  (|S,  nz)  <-►  | D,  nz  +  1)) 
as  indicated  in  Fig.  2.  Qubit  rotations  can  be  written  as  unitary  operations  in  the  following 
way12: 

Carrier  rotations  are  given  by 

R(d,4> )  =  exp  |i^(e^cr+  +  e_J<?l,a_)| 
whereas  transitions  on  the  blue  sideband  are  denoted  as 

R+(d,4> )  =  exp  |z-(e^ff+^  +  e-!<^cr-b)j 

Here  a  ±  are  the  atomic  raising  and  lowering  operators  which  act  on  the  electronic 
quantum  state  of  the  ion,  that  is,  the  first  qubit,  by  inducing  transitions  from  the  |S)  to  |  D) 
state  and  vice  versa  (notation:  a+  =  |D)(S|).  The  operators  b  and  b  ]  stand  for  the 
annihilation  and  creation  of  a  phonon  at  the  trap  frequency,  that  is,  they  work  on  the 
motional  quantum  state,  the  second  qubit.  The  parameter  6  depends  on  the  strength  and 
the  duration  of  the  applied  pulse  and  4>  is  its  phase,  that  is,  the  relative  phase  between  the 
optical  field  and  the  atomic  polarization.  We  use  the  definitions  Ry  =  R( tt/2,  0)  and  Ry  = 
R(  ir/2,ir). 

Translation  of  the  Deutsch-Jozsa  algorithm  into  composite  pulses 

The  quantum  circuit  shown  in  Fig.  1  shows  the  quantum  logic  operations  used  for  the 


implementation  and  Table  1  lists  the  logic  functions  corresponding  to  the  unitary 
operations  Ufn.  The  Ry  rotations  on  the  electronic  qubit  | a)  are  carrier  pulses.  For  efficient 
computation  we  combine  the  rotations  Ry,Ry  on  | w)  and  the  manipulations  for 
implementing  Up  into  an  optimized  pulse  sequence,  RywUfnRyw  (dashed  box  in  Fig.  1).  As 
these  operations  act  also  on  the  motional  state,  we  implement  them  with  pulses  on  the 
carrier  and  the  blue  axial  sideband.  However,  sideband  pulses  operate  on  both  qubits 
simultaneously.  Thus,  for  operations  on  |  w)  alone,  we  first  swap  the  information  from  |  w) 
into  |  a)  with  a  sequence  of  three  blue  sideband  pulses,  then  we  rotate  |  a)  as  desired  and 
swap  back. 

For  a  swap  operation  one  might  be  tempted  to  use  a  single  it  -pulse  on  the  blue 
sideband.  However,  applying  this  to  the  state  |00>  =  |S,  lz)  leads  to  a  population  of  states 
with  two  phonons  outside  the  computational  subspace.  Therefore  we  use  a  composite 
pulse  sequence  consisting  of  three  pulses,  whose  lengths  are  chosen  such  that  starting  from 
|S,1Z)  the  ion  is  rotated  by  and  tt,  respectively.  As  a  result  the  ion  is  rotated  by  4tt 
back  to  |S,1Z)  independently  of  the  pulses’  relative  phases.  In  addition,  using  the  blue 
sideband  ensures  that  1 1 1)  =  |D,0Z)  also  stays  unchanged  as  required  for  the  swap 
operation. 

The  desired  swap  operation  |S,0Z)  <-*■  \D,  lz>  is  possible  because  compared  to  the 
|S,  lz)  <-►  |D,  2Z)  transition,  the  Rabi  frequency  for  the  |S,  0Z)  <-►  |D,  lz)  transition  is  smaller 
by  1/V2  (refs  8,  16).  So  in  this  manifold  the  three  pulses’  lengths  correspond  to  rotation 
angles  of  ir  /  V2, 2tt  / y/2,  tt  / y/2.  It  can  be  shown  that  choosing  the  laser- atom  phase  of  the 
second  pulse  to  be  arcos(cot2(Tr/V2))  =  ttO.3033.  . .  relative  to  the  first  and  the  third 
pulses,  the  populations  of  1 10)  =  | D,  lz)  and  |01)  =  |S,  0Z)  are  exchanged.  This  realises  the 
desired  swap.  Table  3  (case  2)  lists  the  complete  pulse  sequence  for  the  implementation  of 
Ryw  Uf2  Ryw .  Similar  procedures  are  applied  to  realise  the  pulse  sequences  for  cases  3  and  4. 
In  these  cases  the  rotations  Ryw ,  Ryw  and  the  operations  required  for  Up,  Up  can  be 
combined  in  such  a  way  that  swap  operations  become  unnecessary. 
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We  report  the  realization  of  a  nuclear  magnetic  resonance  computer  with  three  quantum  bits  that 
simulates  an  adiabatic  quantum  optimization  algorithm.  Adiabatic  quantum  algorithms  offer  new 
insight  into  how  quantum  resources  can  be  used  to  solve  hard  problems.  This  experiment  uses  a 
particularly  well-suited  three  quantum  bit  molecule  and  was  made  possible  by  introducing  a  technique 
that  encodes  general  instances  of  the  given  optimization  problem  into  an  easily  applicable  Hamiltonian. 

Our  results  indicate  an  optimal  run  time  of  the  adiabatic  algorithm  that  agrees  well  with  the  prediction 
of  a  simple  decoherence  model. 


DOI:  10. 1 103/Phy  sRevLett.90. 067903 

Since  the  discovery  of  the  algorithms  of  Shor  [1]  and 
Grover  [2],  the  quest  of  finding  new  quantum  algorithms 
proved  a  formidable  challenge.  Recently,  however,  a 
novel  algorithm  was  proposed,  using  adiabatic  evolution 
[3,4].  Despite  the  uncertainty  in  its  scaling  behavior,  this 
algorithm  remains  a  remarkable  discovery  because  it 
offers  new  insights  into  the  potential  usefulness  of  quan¬ 
tum  resources  for  computational  tasks. 

Experimental  realizations  of  quantum  algorithms  in 
the  past  demonstrated  Grover’s  search  algorithm,  the 
Deutsch-Jozsa  algorithm,  order  finding,  and  Shor’s  algo¬ 
rithm  [5,6].  Recently,  Hogg’s  algorithm  was  implemented 
using  only  one  computational  step  [7];  however,  a  dem¬ 
onstration  of  an  adiabatic  quantum  algorithm  thus  far  has 
remained  beyond  reach. 

Here,  we  provide  the  first  experimental  implementation 
of  an  adiabatic  quantum  optimization  algorithm  using 
three  qubits  and  nuclear  magnetic  resonance  (NMR) 
techniques  [8].  NMR  techniques  are  especially  attractive 
because  several  tens  of  qubits  may  be  accessible,  which  is 
precisely  the  range  that  could  be  crucial  in  determining 
the  scaling  behavior  of  adiabatic  quantum  algorithms  [9]. 
Compared  to  earlier  implementations  of  search  problems 
[5,10],  this  experiment  is  a  full  implementation  of  a  true 
optimization  problem  which  does  not  require  a  black  box 
function  or  ancilla  bits. 

This  experiment  was  made  possible  by  overcoming  two 
experimental  challenges.  First,  an  adiabatic  evolu¬ 
tion  requires  a  smoothly  varying  Hamiltonian  over 
time,  but  the  terms  of  the  available  Hamiltonian  in  our 
system  cannot  be  smoothly  varied  and  may  even  have 
fixed  values.  We  developed  a  method  to  approximately 
smoothly  vary  a  Hamiltonian  despite  the  given  restric¬ 
tions  by  extending  NMR  average  Hamiltonian  techniques 
[11].  Second,  general  instances  of  the  optimization  algo¬ 
rithm  may  require  the  application  of  Hamiltonians  that 
are  not  easily  accessible.  We  developed  methods  to  imple- 
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ment  general  instances  of  a  well-known  classical  NP- 
complete  (nondeterministic,  polynomial  time)  optimiza¬ 
tion  problem  given  a  fixed  natural  system  Hamiltonian. 

We  provide  a  concrete  procedure  detailing  these  meth¬ 
ods.  We  then  apply  the  results  to  the  Maximum  Cut 
(MAXCUT)  [12]  optimization  problem.  Our  experiments 
indicate  there  exists  an  optimal  total  running  time  which 
can  be  predicted  using  a  decoherence  model  based  on 
independent  stochastic  relaxation  of  the  spins. 

An  adiabatic  quantum  algorithm  evolves  the  quantum 
state  with  a  slowly  varying,  time-dependent  Hamiltonian. 
Suppose  we  are  given  some  time-dependent  Hamiltonian 
H(t ),  where  0  <  £  <  T,  and  at  t  =  0  we  start  in  the  ground 
state  of  H( 0).  By  varying  H(t )  slowly,  the  quantum  sys¬ 
tem  remains  in  the  ground  state  of  H(t )  for  all  0  <  t  <  T 
provided  the  lowest  two  energy  eigenvalues  of  H(t )  are 
never  degenerate  [13].  Now  suppose  we  can  encode  an 
optimization  problem  into  H(T).  Then  the  state  of  the 
quantum  system  at  time  t  =  T  represents  the  solution 
to  the  optimization  problem  [3].  The  total  run  time  T  of 
the  adiabatic  algorithm  scales  as  gmi2n,  where  gmin  is 
the  minimum  separation  between  the  lowest  two  energy 
eigenvalues  of  H(t)  [3,14].  The  scaling  behavior  of  gmin 
will  ultimately  determine  the  success  of  adiabatic  quan¬ 
tum  algorithms.  Classical  simulations  of  this  scaling 
behavior  are  hard  due  to  the  exponentially  growing  size 
of  Hilbert  space.  In  contrast,  sufficiently  large  quantum 
computers  could  simulate  this  behavior  efficiently. 

Smoothly  varying  some  time-dependent  Hamiltonian 
appears  straightforward  but  contrasts  with  the  traditional 
picture  of  discrete  unitary  operations  including  fault  tol¬ 
erant  quantum  circuit  constructions  [15].  Fortunately,  we 
can  approximate  a  smoothly  varying  Hamiltonian  using 
methods  of  quantum  simulations  [16]  and  recast  adiabatic 
evolution  in  terms  of  unitary  operations. 

Discretizing  a  continuous  Hamiltonian  is  a  straightfor¬ 
ward  process  and  changes  the  run  time  T  of  the  adiabatic 
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algorithm  only  polynomially  [14].  For  simplicity,  let  the 
discrete  time  Hamiltonian  H\m\  be  a  linear  interpolation 
from  some  beginning  Hamiltonian  //[0]  =  Hb  to  some 
final  problem  Hamiltonian  H[M]  =  Hp  such  that 
H[M]  =  ( m/M)Hp  +  (1  —  m/M)Hb.  The  unitary  evolu¬ 
tion  of  the  discrete  algorithm  can  be  written  as 


U  =  f[  Um  =  f[  +  (1) 

m  m 

where  At  =  T/(M  +  1),  and  M  +  1  is  the  total  number  of 
discretization  steps.  The  adiabatic  limit  is  achieved  when 
both  T,  M  — ►  oo  and  At  — ►  0. 

Full  control  over  the  strength  of  Hb  and  Hp  is  needed  to 
implement  Eq.  (1).  However,  this  may  not  necessarily  be  a 
realistic  experimental  assumption.  We  will  next  show  how 
the  discrete  time  adiabatic  algorithm  can  still  be  imple¬ 
mented  when  Hb  and  Hp  cannot  both  be  applied  simul¬ 
taneously  and  when  they  are  both  fixed  in  strength. 

When  both  Hb  and  Hp  are  fixed,  we  can  approximate 
Um  to  second  order  by  using  the  Trotter  formula 
exp[(A  +  B)At]  =  exp(AAf/2)exp(Z?Af)exp(AA/y2)  + 
0(At2)  [16].  Higher  order  approximations  can  be  con¬ 
structed  if  more  accuracy  is  required. 

Now  suppose  Hb  and  Hp  are  both  constant.  Since  any 
unitary  matrix  is  generated  by  an  action  —iHAt,  we  can 
increase  the  effect  of  a  constant  Hamiltonian  H  by 
lengthening  the  time  At.  Thus,  we  can  implicitly  increase 
the  strength  of  Hb  and  Hp  even  when  they  are  constant  by 
simply  increasing  the  time  during  which  they  are  applied. 

This  technique  also  allows  cases  when  the  accessible 
Hamiltonians  are  not  of  the  required  strength,  for  ex¬ 
ample,  when  we  are  given  H'b  =  gHb  and  H'p  =  hHp  but 
still  wish  to  implement  Hb  and  Hp.  Using  all  of  the 
described  techniques,  we  can  now  write  Um  as 

JJm  «  e-ifi'b[{\-m/M)\t/2g\  Q  e~ iH'p\(m / M) \t / h\ ,  (2) 

where  A  o  B  =  ABA .  Each  discretization  step  is  of  length 
(1  —  m/M)At/g  +  ( m/M)At/h ,  which  is  not  constant 
when  g  A  h.  As  an  illustration  consider  Fig.  1(a). 

We  choose  At  =  T/(M  +  1)  to  be  constant  as  we  vary 
the  number  of  discretization  steps  M  +  1.  This  way,  the 
total  run  time  T  increases  with  M  +  1,  allowing  us  to  test 
the  behavior  of  the  algorithm  when  approaching  one  of 
the  conditions  for  the  adiabatic  limit.  Even  when  the 
discrete  approximation  is  not  close  to  the  adiabatic  limit, 
the  implemented  algorithm  can  often  find  solutions  using 
relatively  few  steps  but  lacks  the  guaranteed  performance 
of  the  adiabatic  theorem  [17]. 

Adiabatic  evolution  has  been  proposed  to  solve  general 
optimization  problems,  including  WP-complete  ones.  In 
this  general  setting,  the  algorithm  can  depend  on  the 
existence  of  a  black  box  function  or  the  usage  of  large 
amounts  of  workspace.  Our  goal  here  is  to  optimize  a 
hard  natural  problem  in  a  way  that  avoids  these  difficul¬ 


FIG.  1.  (a)  Illustration  of  Eq.  (2).  The  shaded  and  clear  boxes 

denote  the  strength  and  duration  of  the  Hamiltonians  Hb  and 
Hp ,  respectively,  (b)  Illustration  of  a  graph  consisting  of  three 
nodes  and  three  edges.  The  edges  carry  weights  wl2,  w13,  and 
w23.  When  =  w2 3  as  indicated  by  the  length  of  the 

edges,  the  MAXCUT  corresponds  to  the  drawn  cut  The  solu¬ 
tion  is  therefore  s  =  100  and  also  ^  =  01 1  due  to  symmetry. 
This  symmetry  can  be  broken  by  assigning  the  weights  vtq,  w2, 
and  w3  to  the  nodes. 

ties.  We  will  first  describe  which  problem  we  chose  and 
later  explain  why  it  does  not  require  ancilla  qubits. 

We  found  the  MAXCUT  problem  to  be  a  well-suited 
problem  to  demonstrate  an  adiabatic  quantum  algorithm 
because  it  allows  a  variety  of  interesting  test  cases.  It  also 
appears  in  the  study  of  spin  glasses  [18],  among  others. 
The  decision  variant  of  the  MAXCUT  problem  is  part  of 
the  core  WP-complete  problems  [12],  and  even  the  ap¬ 
proximation  within  a  factor  of  1.0624  of  the  perfect 
solution  is  NP  complete  [19]. 

The  MAXCUT  problem  can  be  understood  as  follows. 
A  cut  is  defined  as  the  partitioning  of  an  undirected 
n- node  graph  with  edge  weights  into  two  sets.  We  define 
the  payoff  as  the  sum  of  weights  of  edges  crossing  the  cut. 
The  maximum  cut  is  a  cut  that  maximizes  this  payoff.  By 
assigning  either  st  =  0  or  st  =  1  to  each  node  /,  depend¬ 
ing  on  its  location  with  respect  to  the  cut,  the  MAXCUT 
problem  can  be  restated  as  finding  the  n- bit  number  s  that 
maximizes  the  payoff.  An  extension  of  the  MAXCUT 
problem  is  to  let  the  nodes  themselves  carry  weights, 
which  can  be  regarded  as  the  nodes  having  a  preference 
on  their  location.  As  an  illustration  consider  a  graph  with 
three  nodes  as  drawn  in  Fig.  1(b). 

The  payoff  as  a  function  of  the  cut  defined  by  s  is 

P(s)  =  X  WiSi  +  “  Sj)WV>  (3) 

i  hj 

where  wtj  are  the  edge  weights,  wt  denotes  the  node 
weights,  and  st  is  the  value  of  the  ith  bit  of  s. 

The  smallest  meaningful  test  case  of  the  MAXCUT 
problem  requires  three  nodes  and  admits  a  variety  of 
interesting  cases  by  varying  vty-  and  We  aimed  at 
two  goals  when  choosing  a  representative  set  of  weights. 
First,  we  wanted  the  minimum  energy  gap  gmin  to  be 
smaller  than  the  one  for  a  three-qubit  adiabatic  Grover 
search.  Second,  we  wanted  a  resulting  energy  landscape 
with  both  a  global  and  local  maximum  such  that  a  greedy 
classical  search  would  incorrectly  find  the  local  maxi¬ 
mum  half  the  time  [20].  These  goals  are  met  by  the  choice 
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Wi  =  w2  =  =  2,  wi2  =  2,  w13  =  1,  and  w2 3  =  3. 

The  payoff  function  for  this  set  of  weights  is  P(s)  = 
[067  7  5  9  8  6],  where  s  =  [000  001  010  011  100  101 
110  111].  The  global  maximum  lies  at  s  =  101  so  the 
answer  on  the  quantum  computer  following  measurement 
should  be  1 101),  and  not  at  the  local  maximum  s  =  110 

In  the  quantum  setting,  this  payoff  function  P(s )  can  be 
encoded  into  the  Hamiltonian  Hp  by  rewriting  Eq.  (3) 
using  Pauli  matrices: 

Hp  =  X  wiU  -  azi)/2  +  X  WijV  ~  aziazj)/2’  (4) 

i  i<j 

where  I  is  the  2n  X  2n  identity  matrix  and  cr zi  is  the  Pauli 
Z  matrix  on  spin  i.  The  identity  matrices  in  the  equation 
above  only  lead  to  an  overall  phase  which  cannot  be 
observed  and,  hence,  they  can  be  ignored  The  diagonal 
values  of  Eq.  (4)  are  equal  to  P(s).  Because  of  the  direct 
encoding  of  P(s )  into  Hp ,  no  black  box  function  or  ancilla 
qubits  are  required,  which  makes  this  a  full  implementa¬ 
tion  of  an  optimization  problem. 

Similar  to  Eq.  (4),  the  natural  Hamiltonian  of  n 
weakly  coupled  spin- 1/2  nuclei  subject  to  a  static  mag¬ 
netic  field  B0  is  well  approximated  by  [21] 

3-C  =  -  X  Midjl  +  X  ^JijO’zi^zjl'1  +  ^env.  (5) 

i  i<j 

where  the  first  term  represents  the  Larmor  precession  of 
each  spin  i  about  —  B0,  and  (x)t  is  its  Larmor  frequency. 
The  second  term  describes  the  scalar  spin-spin  coupling 
of  strength  between  spins  i  and  j.  The  last  term 
represents  coupling  to  the  environment,  causing  decoher¬ 
ence.  Note  the  resemblances  between  3~C  and  Hp. 

Despite  the  similarities,  the  spin-spin  couplings  of 
Eq.  (5)  are  generally  different  from  a  randomly  chosen 
set  of  weights.  Therefore,  we  require  a  procedure  to  turn 
the  fixed  Jtj  into  any  specified  weights  wtj.  This  is 
achieved  using  refocusing  schemes  that  are  typically 
used  to  turn  on  only  one  of  the  couplings  while  turning 
all  others  off  [21]. 

We  have  modified  a  refocusing  scheme  to  effectively 
change  the  couplings  to  any  arbitrary  value.  Consider  the 
pulse  sequence  drawn  in  Fig.  2.  Based  on  this  scheme,  we 
can  derive  the  underconstrained  system  (a  +  (3  —  y  — 
8)Jn  =  w12,  (a  —  (3  —  y  +  <5)/13  =  w13,  and  (a  —  (3  + 
y  ~  8)J22  =  w23,  which  can  be  solved  for  positive  a,  (3 , 
y,  and  S  such  that  J — ►  vv /y- . 

The  single  weights  wt  are  implemented  by  introducing 
a  reference  frame  for  each  spin  i  which  rotates  about  —  B0 
at  frequency  (vty-  —  Wf)/ 2.  In  order  to  apply  the  single 
qubit  rotations  of  our  refocusing  scheme  on  resonance, 
we  apply  the  reference  frequency  shift  only  during  the 
delay  segment  a ,  which  we  can  always  choose  to  be  a 
positive  value.  Thus,  Hp  is  implemented  by  applying  the 
refocusing  scheme  from  Fig.  2  while  going  off  resonance 
during  the  delay  segment  a. 


FIG.  2.  Refocusing  scheme  to  effectively  change  Jtj  into  wtj. 
The  horizontal  lines  denote  qubits  1,  2,  and  3  and  time  goes 
from  left  to  right  The  black  rectangles  represent  180°  rotations. 
The  delay  segments  are  of  length  a,  f3 ,  y,  and  8.  When  all 
segments  are  of  equal  length,  all  couplings  are  effectively 
turned  off  [22]  because  axie~l(Tziant a xi  =  el(rzi(Tvt.  In  our  ex¬ 
periment,  a  =  0.42  ms,  (3  =  0  ms,  y  =  4  ms,  and  8  =  2.9  ms 
in  the  last  slice  M  +  1.  The  rf  pulses  that  implement  Hbr 
perform  33.75°  rotations  on  the  qubits  in  the  first  slice. 

A  full  implementation  of  an  adiabatic  algorithm  also 
requires  a  proper  choice  of  Hb.  We  choose  Hb  =  crxi 
for  several  reasons.  First,  its  highest  two  excited  states  are 
nondegenerate.  Second,  it  can  be  easily  generated  using 
single  qubit  rotations.  Third,  its  highest  excited  state  is 
created  from  a  pure  state  with  all  qubits  in  the  |0)  state  by 
applying  a  Hadamard  gate  on  all  qubits  (we  require  the 
initial  state  to  be  the  highest  excited  state  of  Hb  because 
we  are  optimizing  for  the  maximum  value  of  Hp). 

The  full  adiabatic  quantum  algorithm  is  now  imple¬ 
mented  by  first  creating  the  highest  excited  state  of  Hb. 
We  then  apply  M  +  1  unitary  matrices  as  given  by  Eq.  (2) 
and  illustrated  by  Fig.  1(a).  Accordingly,  from  slice  to 
slice,  we  decrease  the  time  during  which  Hb  is  active 
while  increasing  the  time  during  which  Hp  is  active. 
Finally,  we  measure  the  quantum  system  and  read  out 
the  answer. 

We  selected  13C-labeled  CHFBr2  for  our  experiments 
[10].  The  Hamiltonian  of  the  ^-^F-^C  system  is  of  the 
form  of  Eq.  (5)  with  measured  couplings  /HC  =  224  Hz, 
/Hf  =  50  Hz,  and  7FC  =  —  311  Hz.  Experiments  were 
carried  out  at  MIT  using  an  11.7  Tesla  Oxford 
Instruments  magnet  and  a  Varian  Unity  Inova  spectrome¬ 
ter  with  a  triple  resonance  (H-F-X)  probe  from  Nalorac. 

The  experiments  were  performed  at  room  temperature 
at  which  the  thermal  equilibrium  state  is  highly  mixed 
and  cannot  be  turned  into  the  required  initial  state  by  just 
unitary  transforms.  We  thus  first  created  an  approximate 
effective  pure  state  as  in  Ref.  [10]  by  summing  over  three 
temporal  labeling  experiments. 


in  000 


FIG.  3.  Plot  of  the  absolute  value  of  the  deviation  density 
matrix  for  M  =  100  (T  =  374  ms),  M  =  30  {T  =  115  ms),  and 
M  =  15  (T  =  59.2  ms),  adjusted  by  an  identity  portion  such 
that  the  minimum  diagonal  value  equals  zero.  The  scale  is 
arbitrary  but  the  same  for  each  plot. 
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FIG.  4.  Experimental  performance  of  the  adiabatic  algo¬ 
rithm.  (a)  Plot  of  the  error  as  a  function  of  M.  The  error 
measure  is  the  trace  distance  D(p,  a)  =  \p  —  cr|/2,  where  a 
is  the  traceless  deviation  density  matrix  for  M  =  400,  approxi¬ 
mating  M  — ►  oo,  and  p  equals  the  ideal  expected  (O),  the 
experimentally  obtained  (X),  or  the  ideal  expected  traceless 
deviation  density  matrix  with  decoherence  effects  (O)  [6].  The 
minimum  error  occurs  at  about  M  =  60  indicating  an  optimal 
run  time  of  the  algorithm,  (b)  A  similar  observation  can  be 
made  when  plotting  |101)(101|  as  a  function  of  M. 

In  our  experiments,  we  actually  implemented  0.5 Hp 
and  0.5887//^  instead  of  Hp  and  Hb.  This  ensures  that  the 
error  due  to  the  second  order  Trotter  approximation  is 
sufficiently  small.  We  also  choose  g  so  the  applied  rf  field 
does  not  heat  the  sample,  and  g  »  h  so  Jtj  can  be  ignored 
when  applying  Hb.  All  of  these  choices  result  in  a  total 
experimental  time  that  is  within  the  shortest  T2  decoher¬ 
ence  time  [10].  We  reconstructed  the  traceless  deviation 
density  matrices  upon  completion  of  the  experiments 
using  quantum  state  tomography  [10]. 

We  executed  this  algorithm  for  several  M  [with  and 
Wjj  as  listed  above  Eq.  (4)].  Since  we  chose  A t  to  be 
constant,  this  meant  increasing  the  run  time  T  of  the 
algorithm.  The  reconstructed  deviation  density  matrices 
are  shown  in  Fig.  3.  The  plots  clearly  display  the  expected 
pure  state  1 101).  The  local  maximum  at  s  =  110  has  a 
decreasingly  small  probability  of  being  measured  for 
increasing  M.  Simulations  using  Eq.  (2)  show  that  this 
optimization  algorithm  performs  better  for  increasing 
M.  We  wanted  to  verify  whether  this  is  indeed  true 
experimentally. 

For  this  purpose,  we  estimate  the  error  of  our  obtained 
deviation  density  matrices  compared  with  the  ideal  case 
of  M  =  oo.  Figure  4(a)  plots  the  trace  distance  as  a 
function  of  M,  using  the  same  arbitrary  scale  as  in 
Fig.  3.  From  the  plot,  we  observe  there  exists  an  optimal 
run  time  of  the  algorithm,  corresponding  to  0.226  s  in  our 
experiment.  This  optimal  run  time  is  in  good  agreement 
with  the  prediction  of  a  previously  developed  simple 
decoherence  model  [6].  Predicting  the  impact  of  decoher¬ 
ence  has  already  provided  invaluable  insight  into  estimat¬ 
ing  errors  in  previous  experiments  [6],  and  we  believe 
continued  effort  towards  understanding  decoherence  will 
greatly  benefit  experimental  investigations  of  quantum 
systems. 

In  conclusion,  we  have  provided  the  first  experimental 
demonstration  of  an  adiabatic  quantum  optimization  al¬ 
gorithm.  We  show  a  concrete  procedure  turning  a  continu¬ 
ous  time  adiabatic  quantum  algorithm  into  a  discrete  time 


version,  even  when  certain  restrictions  apply  to  the  ac¬ 
cessible  Hamiltonians.  Our  results  indicate  that  there 
exists  an  optimal  run  time  of  the  algorithm  which  can 
be  roughly  predicted  using  a  simple  decoherence  model. 
We  believe  this  implementation  opens  the  door  to  a  vari¬ 
ety  of  interesting  experimental  demonstrations  and  inves¬ 
tigations  of  adiabatic  quantum  algorithms. 
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ABSTRACT 

Quantum  computation  has  become  an  intriguing  technology  with 
which  to  attack  difficult  problems  and  to  enhance  system  security. 
Quantum  algorithms,  however,  have  been  analyzed  under  idealized 
assumptions  without  important  physical  constraints  in  mind.  In  this 
paper,  we  analyze  two  key  constraints:  the  short  spatial  distance  of 
quantum  interactions  and  the  short  temporal  life  of  quantum  data. 

In  particular,  quantum  computations  must  make  use  of  extremely 
robust  error  correction  techniques  to  extend  the  life  of  quantum 
data.  We  present  optimized  spatial  layouts  of  quantum  error  correc¬ 
tion  circuits  for  quantum  bits  embedded  in  silicon.  We  analyze  the 
complexity  of  error  correction  under  the  constraint  that  interaction 
between  these  bits  is  near  neighbor  and  data  must  be  propagated 
via  swap  operations  from  one  part  of  the  circuit  to  another. 

We  discover  two  interesting  results  from  our  quantum  layouts. 
First,  the  recursive  nature  of  quantum  error  correction  circuits  re¬ 
quires  a  additional  communication  technique  more  powerful  than 
near-neighbor  swaps  -  too  much  error  accumulates  if  we  attempt 
to  swap  over  long  distances.  We  show  that  quantum  teleportation 
can  be  used  to  implement  recursive  structures.  We  also  show  that 
the  reliability  of  the  quantum  swap  operation  is  the  limiting  factor 
in  solid-state  quantum  computation. 
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1.  INTRODUCTION 

Physical  systems  that  behave  quantum-mechanically  have  dy¬ 
namics  which  can  be  exploited  to  speed  up  certain  computational 
tasks.  This  is  the  essential  thought  behind  the  field  of  quantum 
computation  and  quantum  information.  A  significant  challenge 
arises  in  implementing  quantum  computation,  however,  because 
quantum  systems  are  unstable:  their  quantum  state  is  easily  altered 
by  omnipresent  extraneous  noise.  This  problem  of  decoherence 
was  once  thought  to  be  a  fundamental  problem  for  quantum  infor¬ 
mation  processing  [8],  but  the  discovery  of  fault-tolerant  construc¬ 
tions  [1,  23,  27,  10]  changed  this;  it  is  now  known  that  an  arbitrar¬ 
ily  reliable  quantum  computer  can  be  constructed  from  unreliable 
quantum  wires  and  gates,  as  long  as  certain  conditions  are  met. 
These  constructions  are  made  possible  by  recursive  application  of 
quantum  error  correction,  generalizing  the  classical  version  of  von 
Neumann’s  early  constructions  for  reliable  automata  [37,  39]. 

The  conditions  for  fault-tolerant  quantum  computation  are  as  fol¬ 
lows:  First,  the  probability  of  failure  of  each  elementary  component 
must  be  less  than  some  threshold  value  pth,  currently  estimated 
to  be  around  10-4.  Second,  current  fault  models  assume  that 
errors  are  independent  and  uniformly  distributed  (although  other 
error  models  can  also  be  dealt  with  by  changing  the  scheme  ap¬ 
propriately).  Third,  and  most  interesting,  a  variety  of  assumptions 
are  made  about  both  the  quantum  circuit  and  the  necessary  classi¬ 
cal  controller.  In  particular,  it  is  essential  that  the  quantum  circuit 
employ  maximum  parallelism  -  executing  as  many  quantum  gates 
simultaneously  as  possible  -  and  that  the  classical  circuitry  con¬ 
trolling  the  quantum  operations  run  at  a  much  higher  clock  speed 
than  the  quantum  circuitry.  Without  these  properties,  ptk  decreases 
significantly  [1,  10]. 

Here,  we  take  this  study  one  step  further,  and  consider  the  im¬ 
pact  of  physical  layout  on  the  requirements  for  fault-tolerant  quan¬ 
tum  computation.  Do  realistic  physical  implementations  of  these 
machines  allow  achievable  fault- tolerance  thresholds?  In  particu¬ 
lar,  what  constraints  must  be  satisfied  in  the  architectural  design 
of  a  quantum  computer  in  order  to  allow  a  reliable  machine  to  be 
realized? 

Such  questions  can  now  be  seriously  considered  in  light  of  recent 
progress  in  the  physical  implementation  of  quantum  computers, 
with  a  wide  variety  of  systems  ranging  from  spins  in  molecules  [9] 
and  single  photons  [18],  to  spins  in  semiconductors  [16],  trapped 
ions  [19,  ?],  and  superconducting  systems  [36],  among  others.  These 
systems  have  led  to  successful  demonstrations  of  a  wide  variety  of 
quantum  information  processing  tasks,  including  quantum  telepor¬ 
tation  [4],  creation  of  multiple  quantum-bit  entangled  states  [24], 
fast  quantum  search  [7,  14],  and  recently,  Shor’s  fast  quantum  fac¬ 
toring  algorithm  [35],  in  factoring  the  number  fifteen,  using  a  seven 
quantum  bit  (qubit)  machine. 
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Figure  1:  Basic  quantum  gates  and  their  matrix  representations 


Among  these  implementations,  the  solid  state  systems  are  per¬ 
haps  the  most  intriguing,  because  of  the  extensive  investment  that 
has  been  made  in  semiconductor  technology  for  conventional  clas¬ 
sical  computing,  and  the  potential  for  scaling  to  large  numbers  of 
qubits.  One  such  scheme,  proposed  by  Kane,  is  particularly  well 
suited  for  architectural  study;  it  captures  common  elements  from 
the  whole  range  of  implementations,  using  the  nuclear  spins  of 
dopant  atoms  in  silicon  as  qubits,  classically  controlled  metal  elec¬ 
trodes  for  control  of  quantum  gates,  and  near  neighbor,  planar  spin- 
spin  interactions  for  multi-qubit  gates.  This  scheme  is  also  suitable 
for  VLSI  style  CAD  layout  and  modeling,  and  reveals  an  interest¬ 
ing  constraint  arising  from  pitch-matching  large  classical  wires  to 
small  qubits,  which  forces  computation  units  to  be  distributed  in 
clusters  rather  than  a  single  sea-of-qubits  structure  [22]. 

Our  study  of  the  architectural  constraints  on  fault-tolerant  quan¬ 
tum  computation  builds  on  the  scenario  posed  by  the  Kane  solid- 
state  implementation  proposal,  and  within  this  framework  we  ob¬ 
tain  several  interesting  results.  We  first  present  complete  layouts  of 
qubits  and  gate  sequences  required  to  implement  a  concatenated 
seven-qubit  Steane  code  for  recursive  quantum  error  correction. 
These  layouts  give  us  analytic  expressions  for  the  circuit’s  space 
and  time  resource  requirements  as  a  function  of  desired  system  re¬ 
liability.  We  also  consider  the  impact  of  planar  near  neighbor  inter¬ 
actions  on  pth  and  find  that  a  huge  limiting  role  will  be  played  by  a 
single  gate,  the  SWAP  gate,  in  determining  achievable  reliabilities. 

We  begin  our  study  in  the  next  two  sections  with  a  brief  overview 
of  quantum  computation  and  error  correction  in  quantum  systems. 
In  Section  4,  we  discuss  the  model  we  will  be  using  for  the  rest  of 
the  paper,  and  the  limitations  it  and  similar  models  impose.  Sec¬ 
tion  5  discusses  implementations  for  error  correction  codes,  while 
section  6  discusses  the  impact  of  communication  on  error  correc¬ 
tion  algorithms.  Finally,  Section  7  discusses  future  work,  while 
Section  8  concludes. 

2.  QUANTUM  COMPUTATION 

We  begin  with  a  brief  overview  of  the  basic  terminology  and 
constructs  of  quantum  computation.  Our  purpose  is  to  introduce  the 
language  necessary  for  subsequent  sections;  in-depth  treatments  of 
these  subjects  are  available  in  the  literature  [21]. 

2.1  Quantum  States:  Qubits 

The  state  of  a  classical  digital  system  X  can  be  specified  by  a 
binary  string  x  composed  of  a  number  of  bits  x;,  each  of  which 
uniquely  characterizes  one  elementary  piece  of  the  system.  For  n 
bits,  there  are  2n  possible  states.  The  state  of  an  analogous  quantum 
system  \| /  is  described  by  a  complex- valued  vector  |\|/)  =  YjXcx\x),  a 
weighted  combination  (a  “superposition”)  of  the  basis  vectors  |x), 


where  the  probability  amplitudes  cx  are  complex  numbers  whose 
modulus  squared  sums  to  one,  Y,x  \cx\2  =  1. 

A  single  quantum  bit  is  commonly  referred  to  as  a  qubit  and 
is  described  by  the  equation  |\| /)  =  co|0)  +  c\  1 1),  where  the  q  are 
complex  valued.  Legal  qubit  states  include  pure  states,  such  as 
|0)  and  1 1),  and  states  in  superposition,  such  as  -^=  |0)  +  1 1), 

or  \  |0)  —  1 1).  Larger  quantum  systems  can  be  composed  from 

multiple  qubits,  for  example,  |00),  or  ^|00)  +  ^|01)  —  ^|11).  An 
7i-qubit  state  is  described  by  2n  basis  vectors,  each  with  its  own 
complex  probability  amplitude,  so  an  n-qubit  system  can  exist  in 
an  arbitrary  superposition  of  the  possible  2n  classical  states  of  the 
system. 

Unlike  the  classical  case,  however,  where  the  total  can  be  com¬ 
pletely  characterized  by  its  parts,  the  state  of  larger  quantum  sys¬ 
tems  cannot  always  be  described  as  the  product  of  its  parts.  This 
property,  known  as  entanglement ,  is  best  illustrated  with  an  ex¬ 
ample:  there  exist  no  single  qubit  states  |\| /a)  and  |\|/g)  such  that 
the  two-qubit  state  I1?)  =  -^=|00)  +  ^=|11)  can  be  expressed  as 

the  composite  state1  |Va)  ®  |Vs)-  Entanglement  and  superposition 
have  no  classical  analogues:  they  give  quantum  computers  their 
computational  powers. 

Although  a  quantum  system  may  exist  in  a  superposition  of  states, 
only  one  of  those  states  can  be  observed,  or  measured.  After  mea¬ 
surement,  the  system  is  no  longer  in  superposition:  the  quantum 
state  collapses  into  the  one  state  measured,  and  probability  am¬ 
plitude  of  all  other  states  goes  to  0.  For  example,  when  the  state 
-E|00)  +  -L|n)  is  measured,  the  result  is  either  00  or  11,  with 

equal  probability;  the  outcomes  |01)  or  1 10)  never  occur.  Further¬ 
more,  if  a  subset  of  the  qubits  in  a  system  is  measured,  the  remain¬ 
ing  qubits  are  left  in  a  state  consistent  with  the  measurement. 

Since  measurement  of  a  quantum  system  only  produces  a  sin¬ 
gle  result,  quantum  algorithms  must  maximize  the  probability  that 
the  result  measured  is  the  result  desired.  This  may  be  accom¬ 
plished  by  iteratively  amplifying  the  desired  result,  as  in  Grover’s 
fast  database  search,  0(y/n)  for  a  dataset  of  size  n  [11].  Another 
option  is  to  arrange  the  computation  such  that  it  does  not  matter 
which  of  many  random  results  is  measured  from  a  qubit  vector. 
This  method  is  used  in  Shor’s  algorithm  for  factoring  the  product 
of  two  large  primes  [26],  which  is  built  upon  modular  exponenti¬ 
ation  and  a  quantum  Fourier  transform.  For  the  interested  reader, 
quantum  algorithms  for  a  variety  of  problems  other  than  search  and 
factoring  have  been  developed:  adiabatic  solution  of  optimization 
problems  (the  quantum  analogue  of  simulated  annealing;  complex- 

^he  composition  operator  for  quantum  systems  is  the  tensor  prod- 
uct,  ®:  |x)  ®  |y)  =  Y.xCx\x)  ®  Y.ycy\y)  =  'Lx,yCxCy\x®y),  where 
x®y  is  simply  the  string  formed  by  concatenating  x  and  y. 
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Figure  2:  Quantum  Teleportation:  Quantum  Teleportation  of 
state  | a).  First,  entangled  qubits  | b)  and  | c)  are  exchanged. 
Then,  | a)  is  combined  with  |  b)  after  which  two  classical  bits 
of  information  (double  lines)  are  produced  via  measurement 
(“meter”  boxes).  After  transport,  these  bits  are  used  to  ma¬ 
nipulate  |  c)  to  regenerate  state  |  a)  at  destination. 

ity  unknown)  [5],  precise  clock  synchronization  (using  EPR  pairs 
to  synchronize  GPS  satellites)  [15,  6],  quantum  key  distribution 
(provably  secure  distribution  of  classical  cryptographic  keys)  [3], 
and  very  recently,  Gauss  sums  [33],  testing  of  matrix  multiplica¬ 
tion  (in  0(n L75)  steps  versus  the  0(n2)  required  classically)  [13], 
and  Pell’s  equation  [12]. 

2.2  Quantum  Gates  and  Circuits 

Just  as  classical  bits  are  manipulated  using  gates  such  as  NOT, 
AND,  and  XOR,  qubits  are  manipulated  with  quantum  gates  such  as 
those  shown  in  Figure  1 .  A  quantum  gate  is  described  by  a  unitary 
operator  U .  The  output  state  vector  is  the  operator  applied  to  the 
input  vector;  that  is,  |\|/out)  =  £/jt|/in).  The  classical  NOT  has  the 
quantum  analogue  X  which  inverts  the  probabilities  of  measuring  0 
and  1.  The  quantum  analogue  of  XOR  is  the  two-qubit  CNOT  gate: 
the  target  qubit  is  inverted  for  those  states  where  the  source  qubit 
is  1.  Most  quantum  gates,  however,  have  no  classical  analogue. 
The  Z  gate  flips  the  relative  phase  of  the  |1)  state,  thus  exchang- 
ing  -^(|0>  +  |1))  and  ^(|0)  —  |1)).  The  Hadamard  gate  H  turns 

|0)  into  ^(|0)  +  |1))  and  |1)  into  ^(|0)  —  |1));  it  can  be  thought 
of  as  performing  a  radix-2  Fourier  transform.  Another  important 
single-qubit  gate,  T,  leaves  |0)  unchanged  but  multiplies  1 1)  by  Vi. 
Single  qubit  gates  are  characterized  by  a  rotation  around  an  axis: 
X  rotates  the  qubit  by  7t  around  the  i-axis;  Z  rotates  by  n  around 
the  z-axis;  and  T  rotates  by  n/4  around  the  z  axis.  By  composing 
the  T  and  H  gates,  any  single-qubit  gate  can  be  approximated  to 
arbitrary  precision.  The  combination  of  T ,  //,  and  CNOT  provide 
a  universal  set :  just  as  any  Boolean  circuit  can  be  composed  from 
AND,  OR,  and  NOT  gates,  any  polynomially  describable  multi-qubit 
quantum  transform  U  can  be  efficiently  approximated  by  compos¬ 
ing  just  these  three  quantum  gates  into  a  circuit. 

One  additional  important  operator  is  the  SWAP  gate.  Just  as 
two  classical  values  can  be  swapped  using  three  XOR’s,  a  quan¬ 
tum  SWAP  can  be  implemented  as  three  CNOTs.  However,  SWAP  is 
often  available  natively  for  a  given  technology,  which  is  valuable, 
given  its  importance  to  quantum  communication. 

Figure  2  shows  a  quantum  circuit  for  teleportation  (described  in 
the  next  section).  In  quantum  circuits,  time  goes  from  left  to  right, 
where  single  lines  represent  qubits,  and  double  lines  represent  clas¬ 
sical  bits.  A  meter  represents  measurement.  By  convention,  black 
dots  represent  control  terminals  for  quantum-controlled  gates.  The 
symbol  ©  is  shorthand  for  the  target  qubit  of  the  CNOT  gate. 

2.3  Quantum  Teleportation 

Quantum  teleportation  is  the  re-creation  of  a  quantum  state  at  a 
distance,  using  only  classical  communication.  It  accomplishes  this 


feat  by  using  a  pair  of  entangled  qubits,  |XF)  =  -^(|00)  +  |11)), 
called  an  EPR  pair2. 

Figure  2  gives  an  overview  of  the  teleportation  process.  We  start 
by  generating  an  EPR  pair.  We  separate  the  pair,  keeping  one  qubit, 
\b),  at  the  source  and  transporting  the  other,  |c),  to  the  destination. 
When  we  want  to  send  a  qubit,  | a),  we  first  interact  | a)  with  | b) 
using  a  CNOT  gate.  We  then  measure  the  phase  and  the  amplitude 
of  | a),  send  the  two  one-bit  classical  results  to  the  destination,  and 
use  those  results  to  re-create  the  correct  phase  and  amplitude  in  |  c) 
such  that  it  takes  on  the  original  state  of  | a).  The  re-creation  of 
phase  and  amplitude  is  done  with  X  and  Z  gates,  whose  application 
is  contingent  on  the  outcome  of  the  measurements  of  |  a)  and  |  b). 
Intuitively,  since  |  c)  has  a  special  relationship  with  |  b),  interacting 
| a)  with  | b)  makes  | c)  resemble  \a),  modulo  a  phase  and/or  ampli¬ 
tude  error.  The  two  measurements  allow  us  to  correct  these  errors 
and  re-create  | a)  at  the  destination.  Note  that  the  original  state  of 
| a)  is  destroyed  when  we  take  our  two  measurements3. 

Why  bother  with  teleportation  when  we  end  up  transporting  |  c) 
anyway?  Why  not  just  transport  | a)  directly?  First,  we  can  pre¬ 
communicate  EPR  pairs  with  extensive  pipelining  without  stalling 
computations.  Second,  it  is  easier  to  transport  EPR  pairs  than  real 
data.  Since  |  b)  and  |  c)  have  known  properties,  we  can  employ  a 
specialized  procedure  known  as  purification  to  turn  a  collection  of 
pairs  partially  damaged  from  transport  into  a  smaller  collection  of 
asymptotically  perfect  pairs.  Third,  transmitting  the  two  classical 
bits  resulting  from  the  measurements  is  more  reliable  than  trans¬ 
mitting  quantum  data. 

3.  FAULT-TOLERANT  COMPUTATION 

We  turn  now  to  an  outline  of  the  basic  constructions  of  fault- 
tolerant  quantum  computation.  This  is  a  rather  involved  subject 
(for  which  the  reader  is  referred  to  the  literature  [21,  10]),  but  three 
essential  ideas  are  covered  here.  The  main  result  we  build  upon 
is  the  following:  A  quantum  circuit  containing  N  error-free  gates 
can  be  simulated  with  a  probability  of  failure  of  at  most  8  using 
O  (poly  (log  (/V/e)) N)  imperfect  gates  which  fail  with  probability  p 
as  long  as  p  <  pth,  where  pth  is  a  constant  threshold  that  is  inde¬ 
pendent  ofN.  This  remarkable  result,  the  Threshold  Theorem  [1],  is 
achieved  by  three  steps:  (1)  using  quantum  error-correction  codes 
(Section  3.1),  (2)  performing  all  computations  on  encoded  data, 
using  fault  tolerant  procedures  (Section  3.2),  and  (3)  recursively 
encoding  until  the  desired  reliability  is  obtained  (Section  3.3).  All 
of  these  results  are  from  prior  literature  [1,  28,  31,  21,  10],  but 
we  describe  them  here  to  make  our  contributions  clearer  in  future 
sections. 

3.1  Quantum  Error  Correction 

The  only  error  which  can  occur  to  a  classical  bit  is  a  bit-flip, 
which  can  be  modeled  as  a  random  NOT  gate.  Quantum  bits  suf¬ 
fer  more  kinds  of  error,  because  of  the  greater  degree  of  freedom 
in  their  state  representation;  surprisingly,  however,  there  are  gen¬ 
eral  strategies  for  reducing  the  universe  of  possible  quantum  er¬ 
rors  to  only  two  kinds:  bit- flips  (random  X  gates),  and  phase-flips 
(random  Z  gates).  Classical  error  correction  codes  only  take  into 
account  bit  flip  errors,  and  thus  are  insufficient  for  correcting  quan¬ 
tum  data;  furthermore,  quantum  states  collapse  upon  measurement, 
so  strategies  must  be  employed  for  determining  errors  without  ac¬ 
tually  measuring  encoded  data. 

2An  EPR  or  Einstein-Podolsky-Rosen  pair  is  a  special  instance  of 
entanglement  noted  in  the  Einstein-Podolsky-Rosen  paradox  [2]. 

3  This  is  consistent  with  the  no-cloning  theorem,  which  states  that 
a  quantum  state  cannot  be  copied. 
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Figure  3:  Quantum  circuit  for  measuring  Z12,  the  phase  differ¬ 
ence  between  \|/2  and  \|/i.  The  meter  box  indicates  measurement 
and  double  lines  indicate  classical  information. 


Table  1:  Phase  correction  for  a  3-qubit  code 


Classical  error  correction  relies  upon  distributing  k  bits  of  infor¬ 
mation  across  n  bits,  n  >  k,  and  ensuring  enough  redundancy  to 
recreate  the  original  information.  Because  of  the  no-cloning  the¬ 
orem,  quantum  information  cannot  be  simply  duplicated.  Instead, 
redundancy  is  achieved  through  entangled  states  with  known  prop¬ 
erties.  For  example,  a  single  logical  qubit,  co|Ol)  +ci|1  l)  can 
be  represented  using  three  physical  qubits,  as  the  state  co|000)  + 
ci | 111).  A  bit  flip  error  on  the  first  (left-most)  qubit  would  turn 
this  into  co|100)  +  c\  |01 1) ;  this  error  can  be  detected  by  comput¬ 
ing  the  parity  of  each  pair  of  qubits,  and  leaving  the  result  in  an 
extra  qubit  called  an  ancilla.  The  three  parities  give  the  error  syn¬ 
drome ,  uniquely  locating  any  single  bit-flip  error.  Crucially,  this 
strategy  reveals  nothing  about  the  coefficients  cq  and  ci ,  since  the 
parities  cannot  distinguish  between  |000)  and  1 1 1 1)  or  any  single 
bit-flip  version  of  the  two  three-qubit  strings.  By  measuring  pari¬ 
ties,  errors  can  be  detected  without  collapsing  encoded  data. 

Correcting  phase  flips  is  achieved  by  measuring  differences  in 
phase,  using  a  circuit  like  the  one  in  Figure  3.  This  works  by  us¬ 
ing  a  Hadamard  gate  to  transform  phase  flips  into  bit  flips;  parities 
are  then  measured  as  before,  the  results  stored  in  ancilla  qubits, 
and  then  the  qubits  are  transformed  back  into  their  original  basis. 
Figure  4  shows  how  a  phase  error  syndrome  can  be  computed  and 
a  corresponding  correction  procedure  applied  to  correct  the  error, 
following  the  specification  of  Table  1 . 

A  quantum  code  which  encodes  one  qubit  and  allows  any  sin¬ 
gle  bit-flip  or  phase-flip  error  to  be  corrected  uses  the  encoding 
co  |0l)  +  c\  |  1l),  where  the  logical  zero  and  one  qubits  are 

(|000)  +  |111))®(|000)+;|111))®(|000)  +  |111)) 

|0l)  * - vf - 

(|000)-|111))®(|000)-|111))®(|000)-|111» 

,u)  - - - 

This  nine  qubit  code,  discovered  by  Peter  Shor  [28],  is  also  known 
as  the  [[9, 1,3]]  code,  in  the  notation  [[n,£,d]]  ,  where  n  is  the  num¬ 
ber  of  physical  qubits,  k  is  the  number  of  logical  qubits  encoded, 
and  d  is  the  quantum  Hamming  distance  of  the  code.  A  code  with 
distance  d  is  able  to  correct  (d  —  l)/2  errors. 

3.2  Computing  on  Encoded  Data 

The  nine  qubit  code  has  a  remarkable  property  that  illustrates  a 
key  requirement  for  fault  tolerance:  applying  a  Z  gate  to  each  of 


Figure  4:  Syndrome  Measurement  for  a  3-qubit  Code.  The 
classical  results  of  measurement  (double  lines)  control  appli¬ 
cation  of  the  Z  operator. 

the  nine  qubits  takes  |0l)  to  \\l)  and  vice  versa.  It  is  the  same  as 
applying  a  logical  X  operator4  to  the  encoded  qubit!  Similarly,  Z 
can  be  performed  by  applying  an  X  operator  to  each  qubit,  and  H 
by  applying  an  H  operator  to  each  qubit. 

In  this  paper,  we  employ  Steane’s  [[7, 1,3]]  code  [30],  which  also 
allows  simple  computation  on  encoded  data,  but  requires  two  fewer 
physical  qubits.  In  addition,  a  CNOT  gate  on  two  encoded  qubits 
can  be  accomplished  using  seven  CNOT  gates,  between  each  pair  of 
corresponding  physical  qubits.  The  last  remaining  gate  necessary 
to  achieve  the  universal  set  from  Section  2.2,  the  T  gate,  can  also 
be  performed,  albeit  with  some  extra  effort  [21].  Thus,  universal 
computation  is  possible  without  requiring  that  the  data  be  decoded. 

Merely  computing  on  encoded  data  is  not  sufficient,  however; 
one  additional  step  is  required,  which  is  frequent,  periodic  error 
correction.  Because  all  gates  used  in  this  task  are  assumed  to  be 
subject  to  failure,  this  must  be  done  in  a  careful  manner,  such  that 
no  single  gate  failure  can  lead  to  more  than  one  error  in  each  en¬ 
coded  qubit  block.  Such  constructions  are  known  as  fault  toler¬ 
ant  procedures ,  and  the  impact  of  this  requirement  on  our  study 
is  twofold:  (1)  no  single  operation  may  cause  multiple  failures, 
and  (2)  measurement  errors  must  not  be  allowed  to  propagate  ex¬ 
cessively.  To  achieve  (1),  no  two  encoding  qubits  are  allowed 
to  both  interact  directly  with  a  third  qubit.  Instead,  the  “third” 
qubit  is  replaced  with  a  cat  state  (a  generalization  of  an  EPR  pair), 

1 00 ...  0)  +  ~^=  1 1 1 ...  1),  that  has  itself  been  verified.  Cat  states 
are  used  because  they  do  not  transmit  errors  through  CNOT  gates. 
To  achieve  (2),  measurements  are  performed  in  a  multiple  fashion. 
While  it  is  not  possible  to  copy  a  value  before  measuring,  it  is  pos¬ 
sible  to  form  a  three-qubit  state,  similar  to  the  three-qubit  bit-flip 
encoding  (Section  3.1),  where  all  of  the  qubits  should  measure  to 
the  same  value;  if  one  of  the  measurements  differs,  it  is  assumed  to 
be  in  error.  These  impacts  are  explained  in  detail  in  later  examples. 

Any  logical  operator  may  be  applied  as  a  fault  tolerant  proce¬ 
dure,  as  long  as  the  probability,  p ,  of  an  error  for  a  physical  op¬ 
erator  is  below  a  certain  threshold,  1  /c,  where  c  is  determined  by 
the  implementation  of  the  error  correction  code.  For  the  Steane 
[[7, 1,3]]  code,  c  is  about  104.  The  overall  probability  of  error  for 
the  logical  operator  is  cp 2.  That  is,  at  some  step  in  the  application 
of  the  operator,  and  subsequent  error  correction,  two  errors  would 
have  to  occur  in  order  for  the  logical  operator  to  fail. 

3.3  Recursive  Error  Correction 

A  very  simple  construction  allows  us  to  tolerate  additional  errors. 
If  a  logical  qubit  is  encoded  in  a  block  of  n  qubits,  it  is  possible  to 
encode  each  of  those  n  qubits  with  an  m-qubit  code  to  produce  an 
mn  encoding.  Such  recursion,  or  concatenation ,  of  codes  can  re- 

4The  overscore  denotes  an  operator  on  a  logical  qubit:  a  logical 
operator. 
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Figure  5:  Tree  structure  of  concatenated  codes 

duce  the  overall  probability  of  error  even  further.  For  example, 
concatenating  the  [[7, 1,3]]  with  itself  gives  a  [[49,1,7]]  code  with 
an  overall  probability  of  error  of  c(cp2)2  (see  Figure  5).  Concate¬ 
nating  it  k—  1  times  gives  ( cp )2  /c,  while  the  size  of  the  circuit 
increases  by  dk  and  the  time  complexity  increases  by  tk ,  where  d 
is  the  increase  in  circuit  complexity  for  a  single  encoding,  and  t  is 
the  increase  in  operation  time  for  a  single  encoding.  For  a  circuit 
of  size  p(n),  to  achieve  a  desired  probability  of  success  of  1  —  e,  k 
must  be  chosen  such  that  [21]: 

(cp)2*  <  J_ 

c  -  p(n ) 

The  number  of  operators  required  to  achieve  this  result  is 

0(poly(logp(n)/e)p(n))  • 

4.  TECHNOLOGY  MODEL 

With  some  basics  of  quantum  operations  in  mind,  we  turn  our 
attention  to  the  technologies  available  to  implement  these  oper¬ 
ations.  Experimentalists  have  examined  several  technologies  for 
quantum  computation,  including  trapped  ions  [19],  photons  [32], 
bulk  spin  NMR  [34],  Josephson  junctions  [20,  36],  electron  spin 
resonance  transistors  [38],  and  phosphorus  nuclei  in  silicon  (the 
“Kane”  model)  [16]  [29].  The  last  three  of  these  proposals,  which 
are  built  on  a  solid-state  silicon  substrate,  share  the  following  key 
aspects: 

1  Qubits  are  laid  out  in  silicon  in  a  2-D  fashion,  similar  to  tradi¬ 
tional  CMOS  VLSI. 

2  Quantum  interactions  are  near-neighbor  between  qubits. 

3  Qubits  are  stored  at  fixed  locations,  but  quantum  data  may  be 
swapped  between  nearest  neighbors. 

4  The  control  structures  necessary  to  manipulate  the  bits  prevent 
a  dense  2-D  grid  of  bits.  Instead,  we  have  linear  structures  of 
bits  that  can  cross,  but  that  have  a  minimum  distance  between 
such  intersections  [22].  This  restriction  is  similar  to  a  “design 
rule”  in  traditional  CMOS  VLSI. 

These  four  assumptions  apply  to  several  solid-state  technolo¬ 
gies,  but  for  concreteness,  we  will  focus  upon  an  updated  version 
of  Kane’s  phosphorus-in-silicon  nuclear-spin  proposal  [29].  This 
scheme  will  serve  as  an  example  for  the  remainder  of  the  paper, 
although  we  will  generalize  our  results  when  appropriate. 

Figure  6  illustrates  the  Kane  scheme.  Quantum  states  are  stored 
in  relatively  stable  electron-donor  (^_-31P+)  spin  pairs,  where  the 
electron  ( e )  and  the  phosphorous  donor  nucleus  in)  have  opposite 
spins.  The  basis  states,  |0)  and  |1)  are  defined  as  the  phase  differ- 
ence  |0)  =  \\e{n  )  +  \ie]n )  and  1 1)  =  |T4«  )  -  l-Utn  )>  respectively. 
Twenty  nanometers  above  the  phosphorus  atoms  lie  three  classical 
gates,  one  A  gate  and  two  S  gates.  Precisely  timed  pulses  on  these 
gates  provide  arbitrary  one-  and  two-qubit  quantum  gates. 

Single  qubit  operators  are  composed  of  pulses  on  the  A-gates, 
modulating  the  hyperfine  interaction  between  the  electron  and  nu¬ 
cleus  to  provide  rotations  around  the  z-axis.  A  globally  applied, 
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Figure  6:  The  basic  quantum  bit  technology  proposed  by  Kane. 
Qubits  are  embodied  by  the  nuclear  spin  of  a  phosphorus  atom 
coupled  with  an  electron  embedded  in  silicon  under  high  mag¬ 
netic  field  at  low  temperature. 


static  magnetic  field  provides  rotations  around  the  i-axis.  By  chang¬ 
ing  the  pulse  widths,  any  desired  rotational  operator  may  be  ap¬ 
plied.  including  the  identity  operator5 .  Two-qubit  interactions  are 
mediated  by  S-gates,  which  move  an  electron  from  one  nucleus  to 
the  next.  Exact  details  of  the  pulses  and  quantum  mechanics  of 
this  technique  are  beyond  the  scope  of  this  paper  and  are  described 
in  [29]. 

The  Kane  proposal,  like  all  quantum  computing  proposals,  uses 
classical  signals  to  control  the  timing  and  sequence  of  operations. 
All  known  quantum  algorithms,  including  basic  error  correction 
for  quantum  data,  require  the  determinism  and  reliability  of  classi¬ 
cal  control.  Without  efficient  classical  control,  fundamental  results 
demonstrating  the  feasibility  of  quantum  computation  do  not  apply 
(such  as  the  Threshold  Theorem  used  in  Section  3). 

The  scale  required  by  the  Kane  model,  on  the  other  hand,  is  at 
odds  with  efficient  classical  control.  In  order  to  provide  the  fine¬ 
grained  control  necessary,  the  control  lines  need  to  operate  in  a 
classical  manner.  That  is,  there  need  to  be  enough  quantum  states 
in  the  control  lines  so  that  electron  movement  is  bulk,  not  ballistic, 
and  voltage  transitions  are  smooth  rather  than  stair-stepped.  Be¬ 
cause  of  this,  the  control  lines  need  to  be  physically  much  larger 
than  the  qubits  they  are  controlling  [22].  Conceptually,  the  control 
lines  need  to  be  of  classical  size  and  pitch,  and  packed  closely  to 
control  quantum  bits  placed  on  a  quantum  scale.  This  imposes 
a  constraint  that  qubits  be  laid  out  in  straight  lines,  with  a  certain 
minimum  number  of  qubits  between  junctions. 

Given  the  constraint  of  linearity  with  infrequent  junctions,  there 
are  several  ways  to  lay  out  physical  and  logical  qubits.  Optimally, 
qubits  should  be  arranged  to  minimize  communication  overhead. 

In  a  fault  tolerant  design,  the  main  activity  of  a  quantum  com¬ 
puter  is  error  correction.  To  minimize  communication  costs,  qubits 
in  an  encoding  block  should  be  in  close  proximity.  Assuming  that 
the  distance  between  junctions  is  greater  than  the  number  of  qubits 
in  an  encoding,  the  closest  the  qubits  can  be  is  in  a  straight  line. 
But  in  order  to  avoid  interacting  two  qubits  in  an  encoding  with  a 
third,  a  two-rail  approach  is  used-one  rail  for  data  qubits,  and  one 
for  communication. 

A  concatenated  code  requires  a  slightly  different  layout  (see  Fig¬ 
ure  7).  Error  correction  is  still  the  important  operation,  but  the 
logical  qubits  at  all  but  the  bottom  level  of  the  code  are  more  com¬ 
plicated.  For  the  second  level,  the  qubits  are  themselves  simple 
encodings,  laid  out  using  the  two-rail  construction.  However,  to 
minimize  communication  costs,  we  want  these  logical  qubits  in  as 
close  proximity  to  each  other  as  possible,  just  like  the  bottom  level. 

5  One  impact  of  the  external  magnetic  field  is  the  state  of  the  qubit 
is  in  constant  flux.  The  identity  operator  must  be  applied  on  every 
“cycle”  in  order  to  keep  the  current  state. 
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Figure  7:  Schematic  layout  of  the  H-tree  structure  of  a  con-  Figure  8:  Measuring  the  error  syndrome  for  the  [[7, 1,3]]  error- 
catenated  code.  The  branches  in  the  inset  represent  the  singly-  correction  code, 
encoded  qubits.  The  D/)  are  data  qubits,  and  the  |Aj)  are  an- 
cillae.  The  V/)  are  for  verification. 


Figure  9:  “Two-rail”  layout  for  the  three-qubit  phase-correction  code.  The  schematic  on  the  left  shows  qubit  placement  and  commu¬ 
nication,  where  D’s  indicate  data  qubits,  A’s  are  ancillae,  and  D'9 s  and  A'9 s  are  for  communication.  The  open  qubits  at  the  bottom 
swap  in  fresh  ancillae,  and  remove  used  ancillae.  The  same  layout  is  shown  as  a  quantum  circuit  on  the  right,  with  the  operations 
required  to  create  and  verify  an  ancillary  cat  state,  and  to  measure  the  parity  of  a  pair  of  data  qubits. 


Hence,  we  need  to  arrange  the  bottom  level  as  branches  coming  off 
of  a  main  bus.  Similarly,  the  third  level  would  have  second-level 
branches  coming  off  of  a  main  trunk,  and  so  on  for  higher  levels, 
forming  an  H- tree. 

5.  ERROR  CORRECTION  ALGORITHMS 

We’ve  discussed  error  correction  in  a  general  sense,  and  how 
the  need  for  recursive  error  correction  influences  the  architectural 
design.  In  addition,  we  have  introduced  several  error-correction 
codes,  such  as  Shor’s  3-qubit  phase-flip  code,  Shor’s  9-qubit  code, 
and  Steane’s  7-qubit  code.  The  constructions  in  Figures  4  and  9 
deal  with  the  simplest  of  these  codes,  the  3-qubit  code,  which  only 
corrects  phase  flips.  In  order  to  correct  both  bit  and  phase  flips, 
a  more  complicated  code  is  needed.  For  the  remainder  of  this  pa¬ 
per,  we  will  focus  on  the  7-qubit  code,  [[7, 1,3]]  ,  which  corrects  up 
to  a  single  error,  and  recursive  codes  based  on  [[7, 1,3]]  which  can 
correct  many  errors.  We  choose  [[7, 1,3]]  because  of  the  ease  with 
which  logical  operators  may  be  applied.  In  particular,  remember 
that  the  logical  operators  X,  Z,  H,  and  CNOT  are  applied  by  apply¬ 
ing  the  simple  operator  to  each  qubit  in  the  encoding  block. 

5.1  The  [[7,1,3]]  Code 

Error  correcting  using  the  [[7,1,3]]  code  consists  of  measuring 
the  parity  of  the  encoding  qubits  in  various  bases.  As  shown  in  Fig¬ 
ure  8,  the  qubits  are  rotated  to  the  measurement  basis  with  Hada- 
mard  gates.  Parity  is  then  measured  in  much  the  same  way  it  is  on 
a  classical  code,  using  two-qubit  CNOT  operators  acting  as  XOR’s. 
Conceptually,  the  parity  can  be  measured  in  the  same  way  as  the 
three-qubit  code  in  Section  3.1,  gathering  the  parity  on  ancilla  |0)’s. 
To  perform  a  fault  tolerant  measurement,  however,  a  cat  state  is 


used  in  place  of  a  |0).  Figure  8  shows  a  schematic  for  measuring 
the  [[7,1,3]]  code.  Not  shown  are  cat- state  creation  and  cat- state 
verification.  In  addition,  each  parity  measurement  must  be  per¬ 
formed  twice  to  reduce  the  probability  of  an  error  from  O(p)  to 
0(p2);  if  the  measurements  disagree,  the  parity  must  be  measured 
a  third  time ! 

A  parity  measurement  consists  of  the  following: 

1  Prepare  a  cat  state  from  four  ancillae,  using  a  Hadamard  gate 
and  three  CNOT  gates. 

2  Verify  the  cat  state  by  taking  the  parity  of  each  pair  of  qubits.  If 
any  pair  has  odd  parity,  return  to  step  1.  (Note  that  this  requires 
six  additional  ancillae,  one  for  each  pair.) 

3  Use  the  four- ancillae  cat  state  as  the  CNOT  target  of  the  data 
qubits  whose  parity  is  to  be  measured. 

4  Deconstruct  the  cat  state  by  selecting  one  of  the  ancillae,  |Aq), 
and  using  it  as  the  CNOT  target  of  the  remaining  three  ancillae. 
|Aq)  now  has  the  overall  parity  of  the  cat  state. 

5  Measure  this  |A)0: 

A  With  |Aq)  =  a|0)  +  p|  1),  create  the  three-qubit  state,  a|000)  + 
P|  1 1 1)  by  using  |Aq)  as  the  control  for  two  CNOT  gates,  and 
two  fresh  |0)  ancillae  as  the  targets. 

B  Measure  each  of  the  three  qubits. 

6  Use  the  majority  measured  value  as  the  parity  of  the  cat  state. 
The  resulting  syndrome  determines  which,  if  any,  qubit  has  an  er¬ 
ror,  and  which  X,  Z,  or  Y  operator  will  correct  the  error. 

For  the  Steane  [[7, 1,3]]  code,  each  parity  measurement  requires 
twelve  ancillae-four  for  the  cat  state  to  capture  the  parity,  six  to  ver¬ 
ify  the  cat  state,  and  two  additional  qubits  to  measure  the  cat  state. 
The  six  parity  measurements  are  each  performed  at  least  twice,  for 
a  minimum  of  144  ancillae  to  measure  the  error  syndrome!  A  less 
complex  example  is  shown  in  Figure  9. 
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Each  of  the  twelve  parity  measurements  require: 

•  One  Hadamard  and  three  CNOT’s  to  create  the  cat  state; 

•  Twelve  CNOT’s  to  verify  the  cat  state; 

•  Four  CNOT’s,  which  can  be  applied  in  parallel,  to  collect  the 
parity  of  the  data  qubits; 

•  Three  CNOT’s  and  a  Hadamard  to  uncreate  the  cat  state; 

•  Two  CNOT’s  to  create  the  three-qubit  state  for  measurement; 
and 

•  Three  qubit  measurements,  which  may  be  performed  in  parallel 
with  the  next  parity  measurement. 

If  the  time  required  to  apply  a  single-qubit  operator  is  S ,  a  CNOT 
is  C,  and  a  measurement  is  M,  then  the  minimum  time  required  to 
measure  the  error  syndrome  is  2 S  +  12(25  +  24C). 

5.2  Concatenated  Codes 

The  [[7, 1 , 3]]  x  [[7, 1,3]]  two-level  concatenated  code  is  measured 
in  the  same  way  as  the  [[7,1, 3]]  code,  except  the  qubits  and  ancillae 
are  encoded.  For  example,  each  logical  ancilla  must  be  prepared  in 
the  following  manner6 

1  Begin  with  seven  ancillae. 

2  Measure  the  error  syndrome,  and  correct,  as  in  Section  5.1.  At 
this  point,  the  seven  qubits  constitute  a  valid  code  word. 

3  Measure  the  value  of  the  logical  ancilla: 

A  Create  a  cat  state  with  another  seven  ancillae  to  collect  the 
parity  of  the  seven  qubits  in  the  logical  ancilla. 

B  Verify  the  cat  state. 

C  Use  the  cat-state  qubits  as  the  CNOT  target  of  the  qubits 
encoding  the  logical  ancilla. 

D  Uncreate  the  cat- state,  collecting  the  parity  into  a  single 
qubit. 

E  With  two  fresh  ancillae,  create  a|000)  +  P|  1 1 1) 

F  Measure  each  of  these  three  qubits. 

4  Use  the  majority  measured  value  as  the  value  of  the  logical  an¬ 
cilla. 

5  If  the  measurement  is  1 1^),  apply  X. 

The  error  syndrome  measurement  is  analogous  to  the  singly-encoded 
[[7, 1,3]]  case,  except  that  the  lower-level  encodings  must  be  error 
corrected  between  operations: 

1  Prepare  four  logical  ancillae  in  a  cat  state. 

2  Error  correct  the  four  ancilla. 

3  Verify  the  cat  state. 

4  Use  the  ancillae  as  the  CNOT  target  of  the  qubits  whose  parity 
is  to  be  measured. 

5  Error  correct  the  four  qubits  in  the  cat  state  and  the  logical  data 
qubits. 

6  Measure  each  of  the  four  logical  cat- state  qubits.  The  parity  of 
these  measurements  is  the  parity  of  the  four  encoding  qubits. 
This  step  is  equivalent  to  the  cat- state  deconstruction  step  for 
the  singly-encoded  case. 

As  in  the  singly-encoded  case,  each  parity  measurement  must  be 
performed  at  least  twice.  The  resulting  syndrome  determines  which, 
if  any,  logical  qubit  has  an  error.  The  appropriate  X,  Z,  or  Y  opera¬ 
tor  can  be  applied  to  correct  the  error.  Of  course,  after  the  operator 
is  applied  to  a  logical  qubit,  that  qubit  must  be  error-corrected. 
Higher  levels  are  error-corrected  analogously. 

6.  COMMUNICATION  COSTS 

In  this  section,  we  derive  the  primary  results  of  this  paper.  First, 
we  model  the  communication  costs  of  our  error  correction  algo¬ 
rithms  under  the  near  neighbor  constraint.  We  show  that  there  are 

6  Fault- tolerant  algorithms  that  avoid  the  overhead  of  encoded  an¬ 
cilla  are  a  topic  of  future  research. 


too  many  SWAP  operations  between  upper  levels  of  our  tree  struc¬ 
tures  and  that  too  much  error  accumulates  to  be  corrected.  Second, 
we  analyze  quantum  teleportation  as  an  alternative  to  SWAP  op¬ 
erations  for  long-distance  communication.  Finally,  we  show  that 
teleportation  is  necessary  both  in  terms  of  distance  and  in  terms  of 
the  accumulating  probability  of  correlated  errors  between  redun¬ 
dant  qubits  in  our  code  words. 

6.1  Error  Correction  Costs 

The  error  correction  algorithms  in  the  previous  section  are  in  an 
ideal  situation,  where  any  qubit  can  interact  with  any  other  qubit. 
Usually,  qubits  can  only  interact  with  their  near  neighbors,  so  be¬ 
fore  applying  a  two-qubit  operator,  one  of  the  operand  qubits  must 
be  moved  adjacent  to  the  other. 

One  of  the  easiest  ways  to  move  quantum  data  is  to  use  the  SWAP 
operator.  By  applying  SWAP’s  between  alternating  pairs  of  qubits, 
the  values  of  alternating  qubits  are  propagated  in  one  direction, 
while  the  remaining  qubit  values  are  propagated  in  the  reverse  di¬ 
rection.  This  can  be  used  to  supply  |0)  ancillae  for  the  purpose  of 
error  correction.  As  a  side  benefit,  this  also  removes  “used”  ancil¬ 
lae.  Figure  9  illustrates  this  method  for  the  three-qubit  example, 
using  two  rows  of  qubits,  one  for  the  encoding  data  qubits  and  one 
for  the  ancillae. 

The  same  method  can  be  applied  to  the  [[7, 1,3]]  code.  The  actual 
communication  costs  depend  on  the  physical  implementation  used. 
The  time  required  for  an  error  correction  parity  check  is 

tern  ^:X2(tCc  +Uv  +tp  +tcd  4*tm)  (1) 

where 

tcc  is  the  time  for  cat  state  creation; 

tcv  is  the  time  for  cat  state  verification; 

tp  is  the  time  to  entangle  the  cat  state  with  the  parity  qubits; 

tcci  is  the  time  to  uncreate  the  cat  state;  and 

tm  is  the  time  to  perform  a  fault-tolerant  measurement. 

For  [[7, 1,3]]  in  the  ideal,  sea-of-qubits  model,  tcc  is  tsingie+3tcnoti 
tcv  is  6(2 tenot  +tmeas ),  tp  is  tcm??-four  CNOT’s  performed  in  parallel, 
ted  is  3 tcnot  + 1 single  and  toveriap  is  tciecat  -\-  tmeas ,  where 

t single  is  the  time  required  for  a  single-qubit  operator; 
tcnot  is  the  time  required  for  a  CNOT  operator; 
tswap  is  the  time  required  for  a  SWAP  operator;  and 
tmeas  is  the  time  required  for  the  measurement  operator. 

If  communication  by  swapping  is  used, 


tcc  —  max  (t single ,  tswap  )  3“  4 tswap  +  3  IR3.x(tcnot  i  tswap  ),  (2) 

tcv  =  max(t singie,  tswap )  H-  1  &tswap-\- l2max(tcnot  i  tswap  ),  (3) 
tp  <  4  max  (tcnot ,  tswap ) ,  and  (4) 

ted  —  3tswap  +  2  max  (tcnot ,  tswap )  +t singie.  (5) 


In  the  Kane  model,  tsingie  <  tswap  <  tcnot,  so  the  overall  cost  is 

tecc  —  336 tswap  3“  168 tcnot  3“  tmeas- 

Since  measurement  is  fully  parallelizable,  these  times  assume 
that  there  are  enough  measurement  functional  units  to  perform  mea¬ 
surement  in  parallel  with  the  other  operations  in  the  error-correction 
cycle. 

6.2  Multilevel  Error  Correction 

For  the  concatenated  code,  the  data  movement  in  the  upper  lev¬ 
els  is  more  complicated.  Although  Eq.  1  still  holds,  each  parity 
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k 

SWAP 

CNOT 

1 -Qubit 

Measurement 

1 

1520" 

zss- 

12 

108 

2 

690,000 

120,000 

4800 

43,000 

3 

2.7  x  1008 

4.7  x  1007 

1.9  x  1006 

1.7  x  1007 

4 

1.1  x  1011 

1.8  x  1010 

7.5  x  1008 

6.7  x  1009 

5 

4.3  x  1013 

7.3  x  1012 

3.0  x  1011 

2.7  x  1012 

Table  2:  Operations  required  for  an  error-correction  cycle  at 
level  k. 


measurement  requires  the  following.  Since  ancillae  are  themselves 
encoded,  they  each  require  their  own  branch,  and  the  first  step  is  to 
create  the  encoded  ancillae,  by  error  correcting,  measuring,  and  if 
necessary,  inverting.  The  second  step  is  to  create  the  four-qubit  cat 
state  from  the  logical  ancillae,  by  applying  H  to  one  of  the  ancilla, 
moving  a  second  ancillae  through  the  main  branch  to  the  second 
rail  of  the  first  ancilla,  applying  CNOT,  moving  the  second  ancilla 
back,  and  error-correcting  both  ancillae.  This  is  repeated  for  the 
second  and  third  ancillae,  and  the  third  and  fourth  ancillae.  Since 
the  ancillae  are  error  corrected  along  the  way,  the  cat  state  need  not 
be  verified. 

Next,  an  ancillae  is  moved  through  the  main  branch  to  the  data 
branch  that  holds  a  bottom-level  encoding.  After  applying  CNOT, 
the  ancilla  is  moved  back  to  its  own  branch,  and  both  it  and  the 
logical  data  qubit  are  error-corrected.  Since  measuring  the  ancillae 
can  be  performed  completely  in  parallel,  all  four  ancillae  are  mea¬ 
sured,  and  the  parity  of  the  measurements  is  the  parity  of  the  data 
qubits. 


For  [7,1, 3]]  i 

concatenated  with  itself  k  times, 

tanc:k 

tecc,k—  1  3“  tm,k—  1  > 

(6) 

tcc,k 

2; 

VO 

+ 

7 

VO 

II 

(7) 

tcv,k 

—  18^cc ,k—  l  T  28f^  kf 

(8) 

tp,k 

—  &tecc,k—  1  3“  1  ®tb,k> 

(9) 

tcd,k 

=  m,k-l 

(10) 

tm,k 

T3 

§ 

J5 

7 

'Ti¬ 

ll 

(11) 

/  ^ 

k—  1 

tb,k 

=  \  t B,archi 

k  =  2 

(12) 

y  (j1 3“  Cl)tb^k— 2  3"  tB,arch  ■> 

k>  2 

where  the  subscript  k  indicates  the  level  of  encoding,  tanc^  is  the 
cost  of  encoding  an  ancilla,  ^  is  the  branch  distance  between  log¬ 
ical  qubits  at  level  k,  tm^\  is  the  time  required  to  measure  a  singly- 
encoded  qubit,  ts^arch  is  the  minimum  number  of  qubits  between 
two  branches  for  a  given  architectural  model,  n  is  the  number  of 
physical  qubits  in  the  non-concatenated  code  and  a  is  the  number  of 
ancillae  per  parity  measurement.  For  concatenated  codes,  parallel 
operation  is  determined  by  the  ratio  of  ancillae  delivery  to  ancillae 
consumption  for  a  singly-encoded  parity  check.  For  [[7, 1,3]]  and  a 
single-qubit- wide  branch  this  ratio  is  around  3.  Arranging  the  an¬ 
cillae  as  in  the  inset  of  Figure  7  minimizes  the  distance  that  ancillae 
must  travel. 

The  recurrence  relation  given  in  Eqs.  6  through  12  give  an  over¬ 
all  time  to  perform  an  error-correction  cycle  at  a  given  level  of  re¬ 
cursion.  A  similar  recurrence  relation  gives  the  total  number  of 
operations  required.  The  number  of  operators  required  for  differ¬ 
ent  levels  of  encoding  are  summarized  in  Table  2,  which  shows  that 
the  SWAP  operator  is  very  important  in  a  realistic  model,  compared 
to  the  sea-of-qubits  model,  where  SWAP’s  are  not  required.  In  this 
realistic  model,  SWAP’s  account  for  over  80%  of  all  operations. 


k 

Teleportation 

Swapping, 

t B,arch  =  22 

Swapping, 

t. B,arch  =  61 

Swapping, 

t B,arch  =  285 

1 

864 

1 

1 

1 

2 

864 

22 

61 

285 

3 

864 

77 

194 

866 

4 

864 

330 

876 

4,012 

5 

864 

913 

2,317 

10,381 

6 

864 

3,696 

9,819 

44,987 

Table  3:  Comparison  of  the  cost  of  swapping  an  encoded  qubit 
to  the  cost  of  teleporting  it.  The  “swapping”  values  are  b the 
distance  between  adjacent  qubits. 
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Figure  10:  Cost  of  teleportation  compared  to  swapping.  The 
values  chosen  illustrate  the  break-even  point  for  different  levels 
of  recursion. 


6.3  Teleportation 

Fortunately,  we  can  use  quantum  teleportation  as  an  alternative 
to  swapping  for  communication  over  long  distances.  To  use  tele¬ 
portation  for  our  circuit,  we  must  evaluate  the  number  of  computa¬ 
tion  and  communication  operations  within  the  teleportation  circuit. 
By  comparing  this  number  of  operations  with  the  swapping  costs 
from  the  previous  section,  we  can  decide  at  what  level  k  of  the  tree 
to  start  using  teleportation  instead  of  swapping  for  communication. 

6.4  Distance  Tradeoff 

By  calculating  the  number  of  basic  computation  and  commu¬ 
nication  operations  necessary  to  use  teleportation  for  long-distance 
communication,  we  can  quantify  when  we  should  switch  from  swap¬ 
ping  to  teleportation  in  our  tree  structure.  Figure  10  illustrates  this 
tradeoff.  We  can  see  that  for  tg  arch  =  22,  teleportation  should  be 
used  when  k  >  5. 

6.5  Avoiding  Correlated  Errors 

An  important  assumption  in  quantum  error  correction  is  that  er¬ 
rors  in  the  redundant  qubits  of  a  codeword  are  uncorrelated.  That 
is,  we  do  not  want  one  error  in  a  codeword  to  make  a  second  error 
more  likely.  To  avoid  such  correlation,  it  is  important  to  try  not  to 
interact  qubits  in  a  codeword  with  each  other. 

Unfortunately,  we  find  that  a  2D  layout  cannot  avoid  indirect 
interaction  of  qubits  in  a  codeword.  At  some  point,  all  the  qubits 
in  a  codeword  must  be  brought  to  the  same  physical  location  in 
order  to  calculate  error  syndromes.  In  order  to  do  this,  they  must 
pass  through  the  same  line  of  physical  locations.  Although  we  can 
avoid  swapping  the  codeword  qubits  with  each  other,  we  cannot 
avoid  swapping  them  with  some  of  the  same  qubits  that  flow  in  the 
other  direction. 
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For  concreteness,  if  two  qubits  of  codeword  4)  and  d\  both  swap 
with  an  ancilla  a$  going  in  the  opposite  direction,  there  is  some 
probability  that  do  and  d\  will  become  correlated  with  each  other 
through  the  ancilla.  This  occurs  if  both  swaps  experience  a  partial 
failure.  In  general,  if  p  is  the  probability  of  a  failure  of  a  SWAP 
gate,  the  probability  of  an  error  from  swapping  a  logical  qubit  is 

nkbkp  +  (^  2  )  bkP2  +  (3  )  bkP3  + ' ' '  > 

where  b ^  is  the  number  of  qubits  between  branches  at  level  k ,  and 
the  higher  order  terms  are  due  to  correlation  between  the  qubits. 
From  this  form,  it  is  clear  that  correlated  errors  are  dominated  by 
uncorrelated  errors,  when  nkp  <C  1 . 

7.  FUTURE  WORK 

Our  results  have  interesting  implications  for  the  Threshold  Theo¬ 
rem,  effectively  increasing  the  reliability  requirements  for  quantum 
operators,  particularly  SWAP  operators.  In  addition  to  the  telepor¬ 
tation  solution  to  long-distance  communication,  it  may  be  possible 
to  modify  the  straightforward  recursive  structure  used  in  quantum 
error  correction  codes  to  include  intermediate  error  correction  steps 
in  the  middle  of  long  chains  of  SWAP  operators.  There  are,  however, 
serious  challenges  of  getting  the  ancillae  to  all  of  these  intermediate 
points  in  such  a  layout. 

At  the  lowest  level,  the  largest  consumer  of  ancillae  for  error 
correction  is  cat-state  verification.  However,  at  higher  levels,  the 
cat  states  themselves  are  constructed  from  logical  ancillae,  each  of 
which  must  be  error  corrected,  measured,  and  the  whole  cat  state 
verified.  This  approach  is  a  straightforward  analog  to  the  lowest 
level,  but  there  may  be  more  efficient  algorithms  from  the  stand¬ 
point  of  ancilla  use. 

The  proposed  teleportation  solution  assumes  that  the  distribution 
of  reliable  EPR  pairs  is  significantly  easier  than  transporting  arbi¬ 
trary  quantum  data.  EPR  pairs  are  precommunicated  in  a  pipelined 
fashion,  then  “purified”  using  an  entanglement-concentrating  algo¬ 
rithm  that  eliminates  bad  EPR  pairs  [25].  Quantifying  the  reliabil¬ 
ity  and  bandwidth  of  this  mechanism  is  the  subject  of  future  study. 

Finally,  this  paper  has  focused  on  solid-state  implementations 
with  static  qubits.  There  is  a  proposal  for  scalable  ion-trap  quantum 
computers,  built  using  conventional  microfabrication  techniques, 
where  the  qubits  are  mobile  [17].  How  the  mobility  constraints  of 
such  a  system  compare  to  swapping  with  static  qubits  is  a  subject 
of  future  study. 

8.  CONCLUSION 

Quantum  computation  is  in  its  infancy,  but  now  is  the  time  to 
evaluate  quantum  algorithms  under  realistic  constraints  and  derive 
the  architectural  mechanisms  and  reliability  targets  that  are  needed 
in  order  to  scale  quantum  computers  to  their  full  potential.  This  pa¬ 
per  has  focused  upon  the  spatial  and  temporal  constraints  of  solid- 
state  technologies,  and  has  shown  that  the  recursive  construction 
for  quantum  error  correction  codes  requires  a  long-distance  com¬ 
munication  technology  such  as  quantum  teleportation.  We  derived 
the  tradeoff  point  between  short-  and  long-distance  technologies. 
Also,  the  reliability  of  the  quantum  SWAP  operation  used  in  short- 
distance  communication  is  the  dominant  factor  in  system  reliabil¬ 
ity.  These  results  are  a  beginning.  The  next  step  is  moving  quantum 
computation  from  theory  to  practice,  unlocking  an  unprecedented 
tool  to  attack  difficult  problems. 
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